- The economic value of consumer transaction data has two distinct sources: consumer identifiability and statistical regularity revealed by data.
- The statistical regularity in multiple dimensions can have increasing-returns-to-scale property, which may result in natural monopoly.
- The statistical regularity is a non-rival good and should be put to use in public domain.
Advent of data economy comes with promises and risks. On the one hand, the consumer big data opens possibilities of novel business models that create consumer surplus (Smith and Telang, 2017; Mayer-Schönberger and Ramge, 2018; Cusumano et al., 2019). On the other hand, as in Netflix's popular documentary The Social Dilemma, an image of Orwellian dystopia is feared as a person's social environment is more and more engineered by eerie predictive powers of data. In this article, I propose a simple distinction in consumer data—identifiability—to draw some economic implications toward better regulation (Li et al., 2019).
Many online platforms provide digital services for free. In return, the platforms collect consumers' valuable data, as goes the adage "when the product is free, you are the product." In a sense, there is nothing new in this business model. Data generated by transactions with customers have been a source of value-added for many traditional businesses. For example, banks' competitiveness as a business is brought from their observations of customers' transactions, which reveal the information on the customer's balance sheet or clientele that could not be observed directly. The value of such transaction data has traditionally been accumulated within a firm as firm-specific knowledge on consumers, business partners, and employees. The specific knowledge derived from the value of transaction data can then be utilized for various management departments, such as marketing, procurement, and human resource, within a firm. Such observations lead to the argument that the data are merely by-product of business transactions, and therefore the value of data captures the productivity gain through conventional learning-by-doing, unlike the externality effect as claimed by lawyers and regulators (Agrawal et al., 2019).
The value of those transactions data stems from the identifiability of the trading partner. The data reveal the behavioural trait of the counterparty that is otherwise unobservable. Such value of data has been utilized in traditional business, and oftentimes provides the basis for the long-term business relationship. The corresponding regulation is concerned with the protection of privacy (Acquisti et al., 2016). For example, the correspondence between a service provider and a consumer established may cause potential welfare loss such as identity theft or privacy breach.
In the time of big data, however, the data can be economically valuable even when the identity is not disclosed. We can imagine a situation where the identity of a consumer who engages in the transaction is not revealed to the provider, but the consumer can benefit from revealing his or her attributes. The consumer privately keeps their identity, but lets the provider use their profile in reference to the corpus of data. When anonymity is preserved, the consumer's marginal cost of data provision can be zero.
A single data point that a customer reveals under anonymity may have little value to a service provider. However, a collection of observations generates a value. That is because a collection of data can reveal statistical regularities. In this sense, an observation of customer data has a positive externality: a single observation does not have a value, but a collection of them potentially does. Here we find a proper externality that might be indeed called data network effect.
Some argue that the statistical value of data exhibits decreasing returns to scale because an increase in the size of data attains only diminishing gains in prediction accuracy. This is true for an objective with a single dimension. However, an extension of data to multiple dimensions may not suffer decreasing returns. For example, fusing the data on two attributes of households will enhance the prediction power on both attributes. Positive externality along with increasing-return-to-scale nature implies a tendency toward natural monopoly in the industry. Thus, conventional economics would call for regulation of such industry to turn into a public utility firm.
Science, in fact, is the name for the collection of such statistical regularities. For example, statistical regularities observed in the human body and compound material are collected and publicly shared as medical science. A collection of statistical regularities found in recorded business transactions may well establish a branch of science. While private firms' endeavour in the collection of data and search for regularities is as welcome as in any field of science, given the non-rival nature of knowledge (Jones and Tonetti, 2020), conventional economics would deem the science to be shared in public ultimately.
The data economy has delivered already impressive dividends to consumer surplus, and its large potential is yet to come. A source of the economic value of data is statistical regularities it reveals, and it may not necessarily infringe privacy. Further research on regulation and technology that facilitates the utilization of anonymous data is called for.