Musk claims that human knowledge is already “exhausted” in terms of AI training.

Elon Musk claims that real-world data is becoming scarce for AI models and advises tech companies to switch to “synthetic” data instead.

Elon Musk has joined other experts in artificial intelligence in asserting that “peak data” will soon be achieved and that there is not much more real-world data available to train AI models.

Nearly all of humanity’s information has been absorbed in AI training, he revealed during a recent webcast.

Musk claimed during the X webcast that “we’ve exhausted basically the cumulative sum of human knowledge … in AI training.” “That happened basically last year.”

Musk, who founded his own artificial intelligence company, xAI, in 2023, predicted that tech firms will be forced to use “synthetic” data, or data produced by AI that enables self-learning.

The only option to add to that is to use artificial intelligence (AI), which will then grade itself and create a thesis or essay of its own. undergo this self-learning process,” he continued.

However, Musk warned that the synthetic data process is at risk due to AI models’ propensity to generate “hallucinations,” or erroneous or illogical results.

Using artificial material is “challenging” because of hallucinations, he said, because “how do you know if it … hallucinated the answer or it’s a real answer?”

“Model collapse”
Musk’s remarks are consistent with a recent scholarly study that suggests publicly available data for AI models may run out by 2026, according to Andrew Duncan, director of fundamental AI at the UK’s Alan Turing Institute, as reported by the Guardian.

He cautioned that relying too much on artificial data may cause “model collapse,” in which the quality of the model’s outputs declines.

In order to illustrate the danger of biased and unimaginative results, he stated, “When you start to feed a model synthetic stuff, you start to get diminishing returns,”

Additionally, Duncan noted that when AI-generated content becomes more popular online, it may eventually be included in AI training data.