Do you know about the supercomputer named "K"? Yes, it's that large-scale computer which became famous after a remark made by a Diet member, Ms. Renho, "What's wrong with being number two in the world?" Since FY2017, I have been a member of a research project comprised of researchers in computer science and physics in addition to economists, under the theme of "Post-K Computer Exploratory Challenge 2: Construction of Models for Interaction Among Multiple Socioeconomic Phenomena," and have been working on analysis of economic data using the K computer. I would like to give a brief description of my experience using the K computer.
I suppose most people, when hearing the words "supercomputer" and "economic analysis," would think the following: "What? Do economists use supercomputers?"; "Big data analysis may not be uncommon these days, but do economists themselves need to do programming? They seem to be complete strangers to that work."; and "Why do they have to deal with supercomputers themselves?" If you think that way, you have a lot to learn. To be honest, I personally thought that way, too, until a year ago, but now I know that was wrong. I will explain why.
Need for a panoramic perspective
Let me first explain what is a supercomputer. As its name represents, a supercomputer enables high-speed calculation, which is achieved by parallel computation. That is, "super" does not mean one high-spec computer, but rather it denotes an aggregate of a large number of ordinary computers (more than 80,000 in the case of the K computer) connected in parallel, in which calculation tasks are assigned to individual computers so that they work together to achieve high-speed calculations. Parallelism is an essential element of a supercomputer. If a calculation consists of independent components, it is easy to do it quickly using a supercomputer. However, in the case of a sequential calculation, such as doing calculation A first and calculation B next, and then doing calculation C based on the results of the former calculations, it is difficult to speed up the entire calculation process even with the use of a supercomputer. This means that when we attempt to do a calculation with data that are so large that it would take an unrealistically long time to calculate, whether we can reduce the calculation time by using a supercomputer depends on the degree of parallelizability of the calculation algorithm. The use of a supercomputer such as K does not automatically lead to a dramatic reduction of the calculation time.
We cannot ignore this point when we set analysis themes or form economic hypotheses. Even when we hit upon a good idea in an economic meaning, if we fail to parallelize the necessary calculation when implementing a calculation program on the K computer and the calculation time is expected to become unrealistically long, that idea would end up becoming a pie in the sky. In other words, when thinking about how to analyze data and which statistical approach to use, it is necessary to take into consideration the degree of parallelizability of the calculation algorithm to be applied. On the other hand, even when we successfully find a statistical approach that would suit parallel calculation and face the prospect of drastic reduction of calculation time with the use of a supercomputer, it serves nothing in the absence of a significant economic theme to which such calculation can be applied. After all, we cannot do without a panoramic viewpoint that covers all factors in the calculation process, from the economic significance of the analysis theme to the feasibility of the calculation with the use of the K computer. Thus, even economists must have an understanding of algorithms, programming, and supercomputers. Nothing can be achieved if economists think that experts in economics need to focus only on economic matters.
Specifically, I currently use the K computer for conducting statistical analysis of a chain reaction of bankruptcy. By combining the interfirm transaction network data of about one million companies and detailed data of company bankruptcy provided by Tokyo Shoko Research, I can follow the process of a company's bankruptcy occurring from day to day and affecting the company's trading partners. An ordinary personal computer does not have enough ability to calculate this level of data size. Therefore, to make this analysis realistically possible, I use the K computer to reduce the calculation time to one-several thousandth or several ten thousandths. In the actual process, I go left and right repeatedly in the above figure; namely, I think of an economic theme, implement a program to do a calculation on the theme, think again about how to proceed with analysis in light of the calculation results, and then implement another program. This is precisely a process of trial and error (the outcome of my analysis work will be posted on the RIETI website sometime soon).
Along with the recent spread of the use of big data, calculation time will be a common constraint in data analysis. In this respect, there could be significant latent demand for supercomputers among economists, but when I talk about my analysis work using the K computer at conferences, for example, I often meet economists who try to find an excuse for not using a supercomputer. They might be feeling that using a supercomputer is a high hurdle, but I would have to say that such attitude is not constructive. They should rather look at the potential of supercomputers which I am sure will broaden the world of research. Thanks to the great efforts made thus far to encourage researchers in fields other than computer science to use the K computer, it is not so difficult to use it by learning a bit of C/C++ (or Fortran) and MPI (Message Passing Interface). The K computer has become applicable in a broader area, for academic and industrial purposes. Perhaps the use of supercomputers will be a standard research method in the field of economics in the future as well. I have some hope that "You are an economist, so I can't believe you have never used a supercomputer!" will become a normal reaction.