Recently, researchers and pundits have referred to data as the “new oil” or the “new electricity”. As discussed below, although this analogy exists to some extent, the nature of data is multifaceted and substantially more complex.
The upcoming book, “The Fourth Industrial Revolution and 100 Years of AI (1950-2050)”, discusses three previous industrial revolutions, each of which had many inventions and at least one led to the creation of a new infrastructure. For example,
- In the first industrial revolution, a new infrastructure related to water-and-steam was created to run steam engines and other machinery.
- In the second, infrastructure related to the generation, distribution, and consumption of electricity originated.
- In the third, the new infrastructure was related to electronic communication (including wireless, wire-line, and satellite).
Similarly, in the current and fourth industrial revolution (which started in 2011 and may continue beyond 2050), new infrastructure related to production, ingestion, cleansing, harmonizing, and utilizing of data will be created, wherein many producers will also be the users (“prosumers”).
However, the above analogy obscures the multifaceted nature of data. For example:
- A given amount of electricity can only be consumed once whereas the same data remains undiminished after being used several times or for different use cases.
- Although ten units of electricity will provide ten units of value, a dataset with ten times the size may provide less or more return.
In fact, because of the following reasons (which are discussed in chapter 12), data and its infrastructure will be more complex than their counterparts in the first three revolutions:
- Bias in data arises because of biases in humans who are collecting it, annotating it, harmonizing it, reconciling it, and then using it for training AI systems. Since humans are biased, usually so is data.
- Because AI systems are brittle (i.e., their accuracy deteriorates tremendously after adding even small noise), training these systems with biased data can yield wrong results, thereby hurting humans especially in domains related to healthcare, product safety, robotics, criminal justice system, recruiting, autonomous car driving, military, and defense.
- Other data-related idiosyncrasies include
- Unclear definition of data ownership
- Confidentiality, privacy, and security
- Consent and purpose, as well as
- Auditability and lineage.
- To manage the above-mentioned quirks of data, different societies are adopting different approaches and it will be almost impossible to come up with a universal set of rules that will govern the use of such datasets.
The book titled “The Fourth Industrial Revolution and 100 Years of AI (1950-2050) will be published in September 2023. For details, see www.scryai.com/book