RIETI Policy Symposium

What Can Be Learned from Large-Scale Business Data: An approach integrating economics and physics (Summary)


  • Time and Date: 14:00-18:30; Thursday, March 5, 2009
  • Venue: Star Hall, Josui Kaikan
    2-1-1 Hitotsubashi, Chiyoda-ku, Tokyo


Opening Remarks

FUJITA Masahisa (President and Chief Research Officer, RIETI)

The rapid development of information technology is leading to great changes in all our lives. What impact does this have on the methodologies and research styles we researchers adopt? That is our topic today.

Looking at the world of business, the spread of email and the Internet is greatly changing the way business is being conducted. But in other ways, too, information and communication technology has come to play an important role in the conduct of business, such as in the gathering of point-of-sale (POS) data, the use of electronic money, and online trading.

From a researcher's point of view, what is particularly noteworthy is the electronic information gathered as a result of such developments. For example, in the case of POS data, what kind of people come to a shop at what hour, minute, and second, and how many pieces of which items they purchase is all electronically recorded with the help of computers. In the past, this type of information was recorded on paper, and for researchers even to physically see this information was almost impossible; however, through the development of information and communication technology, it has now become possible for researchers to use such information. This growth in the scope and amount of business data available to researchers has led to a far-reaching expansion in research possibilities.

To give an example from the activities of RIETI, we are currently engaged in various cutting-edge research projects, such as price formation in online markets, exchange rates, and inter-firm transaction networks, which rely on business data obtained through the support and cooperation of a great number of firms.

The appeal of such business data has attracted the attention of researchers outside of economics and business studies, as researchers in the natural sciences, especially physics, are trying to explain economic phenomena using these data. This is what is meant by the subtitle of today's topic, "An Approach Integrating Economics and Physics." Economics and physics are disciplines that have historically had an extremely close relationship. With a common interest in the use of business data, there is great potential for building an even more intimate relationship between these two disciplines.

For our symposium today, we have invited a number of scholars from the field of physics who have conducted pioneering research on economic phenomena, not the least being Professor Stanley, who is a leading figure in this area. Together, I hope we will have fruitful discussions on questions such as: What aspects of economic phenomena can the use of business data help to elucidate? What measures are necessary to further promote the use of business data in the future? And what role can the use of business data play in understanding and resolving the current financial crisis?

Aims of the Symposium

WATANABE Tsutomu (Faculty Fellow, RIETI)

To illustrate the way in which the data that economists use has changed, I would like to give my own experience as an example.

When I started conducting economic research almost 30 years ago, the data I used were, for example, the System of National Accounts (SNA), the consumer price index (CPI), statistics on industrial production, etc. These were data aggregating the behavior of a large number of firms and households, and they were compiled by the government.

Now let's look at the data used in the research conducted by RIETI and others at the present time. There is, for example, exchange rate transaction data from Electronic Broking Services (EBS), which shows the price and transaction volume for each individual transaction, and data from kakaku.com, which provide sales data on, for instance, digital cameras with regard to the price and volume for each product at the transaction level. Other data are on rental housing transactions compiled from classified ads in housing magazines published by Recruit, Co. Ltd. Furthermore, with the cooperation of firms such as Nikkei, Intage, and BCN, we are using scanner data recording the sales of individual items at supermarkets and large consumer electronics retailers; and through the cooperation of Teikoku Databank and Tokyo Shoko Research, we are using data on the transaction relationships between firms.

What all these types of data have in common is that, unlike the SNA, etc., they are not aggregated and many contain information at the individual transaction level. Moreover, these data were prepared by firms and not by the government, which differs in respect from the data of old. Yet another aspect is that they were generated to improve business operations and are real "business data" in this respect. In other words, they were not originally intended for research purposes to be used by researchers like ourselves, and such use is an "appropriation."

Such appropriation of data compiled by firms means that if it were not for the cooperation of firms, research using such data would not be possible. Conversely, if it is possible to enlist the cooperation of firms, researchers can get their hands on more diverse data and deepen their understanding of various economic phenomena. I am looking forward to discussing with you today how we can promote the use of a wide range of business data for research purposes.

Summary of Keynote Speech 1 "Frontier of Network Science: From the topology of the www to the business web"

Albert-László BARABÁSI (Northeastern University)

Network Structures Underlying Social Phenomena

Representative examples of networks are the Internet or the connections between individuals within a company. In the case of the former, computers are physically connected with each other through telecommunication links, while the latter type of network consists of links between people through communication via e-mail, etc. Typically, in social networks computers and people are not randomly linked with each other; rather, links concentrate on a few specific computers or people. Networks with this characteristic are called scale-free networks.

Networks are formed by linking nodes (computers, people, etc.) one by one with each other. If, in what is called preferential attachment, each node wishes to link itself to other nodes with the most connections, a scale-free network arises. Scale-free networks are robust if the probability of failure of each node is identical, because the structure of the network will be preserved. However, scale-free networks are susceptible to concentrated attacks from the outside on nodes with many connections, and if such nodes fail, the network will collapse.

Examples of the Application of Network Analysis

If we examine the composition of the products countries export, we find that countries exporting "bananas" often also export "mangoes" and countries exporting "automobiles" often also export "electronic equipment." This is because, chances are, the technology used for producing and shipping mangoes is similar to that for bananas. The same applies to automobiles and electronic equipment. In other words, products can be grouped in terms of the production and shipping technologies required, and in this sense there can be said to be networks between products. By examining each country's networks of relatedness between products, it becomes possible to search for export strategies by determining which country has a forte in producing and shipping which products.

Summary of Keynote Speech 2 "Information Spirals in Social Network Dynamics for Large Internet Databases: YouTube, open source software, cyber-risks, financial markets"

Didier SORNETTE (ETH Zürich)

Google Searches and YouTube Views

To illustrate what we can learn from patterns of social network dynamics, consider the case of Internet searches or video views on Google and YouTube. It was recently reported that search patterns for terms related to flu are an advance predictor of the number of cases reported by hospitals. Taking the example of Google and YouTube, the focus here is on critical events and shocks, such as advertising or news, and the nature of response patterns in social networks. For instance, following the tsunami disaster of December 2004, the number of Google searches for "tsunami" jumped instantaneously, but then rapidly declined. This is a typical response to an exogenous shock. On the other hand, search activity surrounding a Harry Potter movie increases gradually before the release date, and then gradually decays afterward. This is a typical response to an endogenous shock.

Responses to Shocks

These types of response functions or impulse response can be observed from the moment people first come across them because it takes time for people to search for, or view, a particular piece of information online, such as a specific news story. From the speed of decline in the number of searches or views from the peak, it is then possible to determine the connectivity and influence between people that may propagate or note as an epidemic via work-of-mouth or click-of-mouse. This method of analyzing the response to such kinds of shocks can also be applied to changes in volatility in financial markets or changes in the number of cyber attacks on the net.


Questions from the floor on Keynote Speeches 1 and 2


Does the increase in the number of citations of academic articles follow an exponential path?

Albert-László BARABÁSI

No, the increase is linear, not exponential.


Is the increase in the number of citations of academic articles an endogenous or an exogenous phenomenon?


Endogenous and exogenous phenomena generated through network effects exhibit a power law property in the response function in time. This property is not seen in the increase in citations of academic articles.


How can the current financial crisis be understood from the viewpoint of networks?

Albert-László BARABÁSI

Because mutual trust between banks has been lost, the network of the flow of funds between banks and firms has become fragmented and firms can no longer procure necessary funds.


Is the shock resulting from the collapse of Lehman Brothers an endogenous or exogenous phenomenon?


Because the impact of the shock is persistent, the endogenous effect seems to be dominant.


How can we counter a situation where business conditions deteriorate as a result of rumors?

Albert-László BARABÁSI

It is difficult to quantify rumors, but unless we can obtain quantitative data, we cannot use network analysis to suggest ways for countering rumors.


To sever the feedback effect of negative rumors through herding, the central bank must undertake an extremely clear and drastic policy shift.

Summary of Keynote Speech 3 "Human Behavior in Marketing"

TAKAYASU Misako (Tokyo Institute of Technology)

Investigating the Assumption of Randomness of Intervals between Sales

While the behavior of each individual is governed by a purpose, and such purposes are not random at the individual level, when observing a large number of people the behavior of individuals appears to be random. Using POS data from convenience stores, we examined this assumption of randomness and found that on a second-by-second basis, the intervals between sales approached a Poisson process; moreover, at a time scale of one hour, there are gradual changes in the parameter of the Poisson process. The results imply that people's purchasing behavior at convenience stores, as a group, can be regarded as random.

The Special Characteristics of Bargain Sales

Next, using supermarket POS data to examine the role of mass psychology, we found that there exists a nonlinear exponential relationship between the price and sales volume ratios on the day of a sale and the previous day, and that the exponent of this relationship depends on the "best before" date of the product. This means that people's purchasing behavior in supermarkets responds in an oversensitive and non-linear manner to bargain sales, and that the longer the "best before" period for a product, i.e., the more suitable it is for storing at home, the larger is the response of purchasing behavior to bargain sales.

Buying and Selling in Financial Markets

A distinctive feature of financial markets is that prices change by the second and participants act both as buyers and sellers. Using foreign exchange market data and analyzing these special characteristics, we find that foreign exchange market behavior cannot be described only as a random walk, but instead is better described by introducing the physics concept of potentials. Applying this concept in our analysis, we show that the foreign exchange market is influenced by endogenous fluctuations, exogenous news and so forth; that frequently occurring large changes in prices are brought about by dealers following the trend ("trend-follow" effect); and that the trend-follow effect can be described by market potentials.

Word of Mouth on the Internet

Using the occurrence of certain keywords found in blogs on the Internet, we analyzed word-of-mouth information on the Internet and found that in regard to keywords with a "time limit" due to seasonality or temporariness, there is an exponential relationship between the elapsed time following a certain event and the number of times related keywords are used. Moreover, keywords that occur infrequently appear randomly, but keywords that occur frequently do not exhibit randomness and instead show signs of an intertemporal correlation.


Questions from the floor on Keynote Speech 3


What are the aims or objectives of using potentials in analyzing foreign exchange markets?


The area in which there is the greatest need for this type of analysis is probably the quantitative measurement of risk, either in real time or with regard to the future. More interestingly, we have found from theoretical analysis that if we do "coarse graining" by increasing the time scale (or time interval) to infinity, then the solution has only three patterns, i.e., a random walk, exponential divergence, and finite divergence.


How are rumors propagated and how do they affect society? In the present deterioration of economic conditions, I am wondering whether rumors that the economy is deteriorating have contributed to the actual deterioration of the economy.


There exists a clear correlation between rumors and sales volumes. At present, we are investigating examples of commodities with either strong or weak correlations with rumors. Regarding the correlation between rumors and sales volumes, the direction of causality is difficult to identify, and to elucidate this will take time.

Panel Discussion: "How to Use Large-Scale Business Data"

Session Outline

This panel discussion focused on the following issues:

  1. How can the use of large-scale business data for research purposes be further promoted?
  2. What is the relationship between government statistics, such as GDP statistics, and business data?

Chair: WATANABE Tsutomu (Faculty Fellow, RIETI)

How can business data be gathered for research purposes? What are the technical and legal challenges involved when gathering and using such data? And, assuming that such challenges are overcome, how will the use of business data progress from here? These are some of the questions to which I hope this panel discussion will provide some answers.


The main difficulties when using business data are not technical issues but proprietary issues. For example, much of the most valuable data, such as Internet data that Amazon and Google have, is among the most closely guarded and is not released. This kind of tendency is unlikely to change in the future and we have to respond to this issue on a case-by-case basis.

H. Eugene STANLEY (Boston University)

There is no answer to this question in general. For example, if we decided to use data from cell phone companies for analysis, there are clearly legal and ethical issues involved. However, this does not mean that there is not much that can be done without such data where issues of secrecy and confidentiality loom large.

For example, simply by looking at closing price data for the S&P500 published by Yahoo on the web, it is possible to examine standard trading models. Moreover, it is not that expensive to purchase databases that contain every transaction and such data are currently widely used by researchers. Although storing such huge amounts of data poses various problems, except for that, the use of such data poses no great problems. As long as it is possible to obtain this sort of data, it is possible to make new discoveries. For instance, although financial "tsunamis" occur only rarely, such data show that the probability of this type of event occurring certainly exists.

Chair: WATANABE Tsutomu

As we just saw, financial "tsunamis" are extremely rare phenomena. What is the relationship between the amount of data available and any predictions regarding such "tsunamis"?


The more data you have, the more likely you are to make a discovery. But even without a large dataset, it is still possible to make a discovery. However, without a large dataset, such a discovery or prediction would be less convincing.


However, having a lot of data can also be a bad thing. If there is too much apparent information, noise also increases. There is often the illusion of control when developing large databases, The key problem is to make sense of the huge amount of data that modern society is accumulating. This is the most important and interesting challenge. Extracting meaningful information also involves substantial time and costs. In other words, too much data can also give rise to new problems.

Chair: WATANABE Tsutomu

Next, I look forward to hearing from Professor Deguchi about the electronic collection of business data.

DEGUCHI Hiroshi (Tokyo Institute of Technology)

  1. Although business data in the private sector are available in electronic form, the major statistics of Japan are still being collected in the form of paper-based questionnaires. This is extremely costly. However, this situation with regard to private business data is going to change through the application of electronic filtering. Issues in this context are the protection of confidentiality of private electronic data and business data, as well as the use of aggregate data as a kind of public good, and the proper feedback of this to the private sector. To this end, some kind of extraction filter needs to be designed.
  2. Leading practical examples are the Bank of Japan's extraction of bank data using XBRL, the publication of financial statements by the Financial Services Agency using XBRL, and the Ministry of Health, Labour and Welfare's JANIS. Possible areas for this kind of electronic extraction include ordering systems for clinical records and examinations, systems for medical prescriptions, enterprise resource planning, or the use of POS data. Other possible areas are surveys such as the "Personal Tracking Survey" using PASMO or SUICA.

Chair: WATANABE Tsutomu

What practical problems and hurdles are there in this kind of electronic extraction?


Rather than any technical hurdles, the main hurdles are institutional and require the cooperation of the private sector. Should the data we obtain be considered to be a public or common good? In what form could such data be fed back to participating corporations? On these issues, no clear notions have been formed yet. Therefore, in using this sort of data, it is important to form a clear vision of what kind of estimations we want to conduct and what merits this would have.

Chair: WATANABE Tsutomu

Next, we are going to hear from Mr. Matsumoto how finance-related data are being created during the conduct of business and to what extent it is possible to make such data available for external use.

MATSUMOTO Oki (Monex Group, Inc.)

  1. Since Monex is an online securities company, we have all manner of data on orders, contracts and transactions, and it is easy for us to use such data. However, because there are large differences in the way different investors can use this information, this can create unfairness among investors, and we therefore do not make our data available outside the firm. While we do not make our data available outside the firm, we do use them for analyses within the firm and have, for example, compared the performance of short- and long-term investors and analyzed trends in trading by customers overall. However, in conducting such analyses, an important condition is that we at Monex do not have a trading desk and do not conduct any proprietary trading.
  2. Through this data analysis, we provide our customers with various kinds of feedback. We can, for example, provide better trading tools and offer better advice. However, to provide such feedback, it is necessary to have no proprietary trading. Moreover, we also need to obtain customers' consent with regard to the use of their data. Because transaction data is the property of each individual investor, we need to consider how we can safeguard the interests of each individual investor and who the analysis of such data is intended for.

Chair: WATANABE Tsutomu

The Bank of Japan and the government have data on the network of money transfers between firms as well as on the network between banks, which can be used for analyses. Would it not be possible to build an environment in which such data could also be used by researchers?

IWATA Kazumasa (Economic and Social Research Institute [ESRI])

  1. At ESRI, we have been engaged in statistical reforms since 2006. As part of this, we have been working on building a statistical databank that compiles government microdata in one place. We have also been holding discussions on how to handle surveys and censuses conducted by the various government departments in such a way that they are easy to use, and in the process we have been considering how to create an environment in which such data are also easy to access for researchers while at the same time ensuring that privacy concerns are addressed.
  2. At present, at ESRI we are conducting the estimation of national accounts using AADL. A theoretically possible extension of this work is to use data on a firm-level transaction basis and construct SNA data following the financial accounting procedure. Regarding financial data, timely disclosure network systems using XBRL are being employed by the Tokyo Stock Exchange as well as the Bank of Japan and the Financial Services Agency. Moreover, the settlement of transactions between banks in the BOJ-NET is being conducted on a real-time gross settlement (RTGS) basis. Furthermore, for the consumer price index (CPI), we are partly using POS data.

Chair: WATANABE Tsutomu

In what sort of relationship will government statistics and business data stand in the future?

IWATA Kazumasa

With regard to CPI, we are already making improvements by partially using business data, that is, POS data. And in the future, I think we will be able to use a wider range of business data. Similarly, with regard to national income accounts, it is possible to use business data and thus increase their accuracy. Consequently, the role of business data is bound to rise.


The role of business data in government statistics is certainly going to rise in the future, but already there are many concrete examples. But there are also practical problems such as how we obtain such data and problems with regard to missing values and statistical distortions. Moreover, when constructing indicators using private-sector data, it is necessary for the government to be careful in the handling of such data, to carefully assess the public interest, and to design appropriate procedures for the use of electronic data.


The following questions were asked from the floor.


A premise in economics is that agents such as households and firms interact with one another through economic activities. So, in a sense, if people believe the economy is going to deteriorate, then it will deteriorate, and if they believe it is going to improve, then it will improve. What is your view as physicists on this?


While individual actors act according to their free will, collectively humans behave according to certain statistical laws. So although humans are free to make certain decisions, we are not as free as we might think. For example, when a lot of humans are looking at the exact same question in a competitive fashion, then their freedom is greatly diminished. We are interested in explaining why this happens, and at present where it is possible to use large-scale data I think such an explanation is possible.


This is a good example of the collision between endogenous and exogenous factors. The endogenous element is that by believing collectively in something, we make it happen. But this, of course, is not all. We are not in a closed system, but are subjected to many shocks. Accordingly, we have to look at society not as a system of coupled spins, but as a hierarchy. In this respect, there is much that physicists can learn from economists and social scientists, and all of us are proposing new ways using these new opportunities and databases.


Do you expect that it will be possible to develop an option pricing model that is not predicated on the normal distribution?


This is a big challenge. If you know the facts, you know to distrust a model based on falsehoods. It would be wonderful if we could do the next step and fold the correct facts into a trading model.


Many groups have proposed models that take into account fat tails and there are, in fact, companies working on these models and providing software to improve option pricing. Already, many stylized fact have been incorporated in pricing models with considerable success.


When will this economic crisis end?


The present economic crisis was created by a positive feedback loop where credit derivatives were moved off balance sheet and nonperforming subprime mortgage loans mounted. What is important is to theoretically understand the dynamic system related to this process and control it. Moreover, the growth of debt is like a virtual currency that brought about massive inflation. It is only natural that this resulted in a crash. The challenge, therefore, is to build an institutional regime where these things can be controlled.

IWATA Kazumasa

When the crisis will end depends on the policy response from hereon. However, the policy response is not decided by the government alone as it pleases; rather, it is necessary for both market participants' and the government's views on the future losses of financial institutions to converge at some point in time. Moreover, the taxpayer must be ready to accept such losses. The length of the crisis will depend on when the expectations of these three parties converge.


I think the most important factor for demand to recover is psychology. One psychological factor is the fact that there are doubts in the market with regard to whether the U.S. government is accurately assessing the balance sheet problems of financial institutions. Another is that, looking at Japan's experience following the burst of its bubble, until all policy responses have been exhausted market participants will always wait for the next measure. Conversely, I think that once all available measures have been exhausted, confidence will begin to return and the economic crisis will end.


Concerning whether we could have seen this crisis coming or whether we could better control it if we had better data and better models, we should not forget that nobody can stop a ballroom when everybody is dancing. This is something that is intrinsic to the human community and we tend to attribute the crisis incorrectly to purely technical aspects of models, data gathering, and so on.

In terms of where we are going, we have to realize that the financial sphere has expanded to an extent that is out of proportion to anything that has to do with the real economy. The "crisis" is not a crisis; it is just a return to valuations that are more in line with fundamentals in the real economy. Two related problems are the excessive pessimism and lack of confidence. But another problem is the banks freezing lending, which can be seen in the fact that the money multiplier is now below 1. What we as physicists can do is propose structures for a robust economic system that allows it to bypass a particular sector when this freezes the economy. An example of this is the construction of a parallel network, such as the WIR banking system that was developed in Switzerland in the 1930s to fight successfully the great depression. The idea is to gain robustness by counting on several network systems, so that when one freezes, another can take over. In the WIR system, it is a set of direct interactions between companies.


Nobody can predict when the crisis will end. But one thing we can all do is think hard, discuss with our colleagues, and make recommendations to our leaders. Some action is wise to take, but exactly which action is not entirely clear. More collaboration between scientists from different disciplines to make suggestions is one way to go.

Closing Remarks

OIKAWA Kozo (Chairman, RIETI)

Considering the breadth and depth of the topic today, it feels like the past four-and-a-half hours have passed in the blink of an eye. I would like to express my heartfelt thanks to all our presenters and panelists, as well as members of the audience, for the exciting presentations and provocative discussions. At RIETI, we are constantly engaged in empirical research, but through today's discussions I have once again become keenly aware of the importance of high quality data.

This symposium brings to an end APFA7, which started with an international conference on March 1 and has been jointly organized by the Tokyo Institute of Technology and Hitotsubashi University. I would like to extend my deepest thanks to everyone from the two universities who worked tirelessly to make this symposium a success, especially Professor Watanabe from Hitotsubashi University.