Technology NewsTechnology NewsTechnology News
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Reading: A.I. Data Shortage Looms as Websites Clamp Down
Share
Font ResizerAa
Technology NewsTechnology News
Font ResizerAa
Search
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Follow US
  • Cookie Policy (EU)
  • Contact
  • About
© 2025 NEWSLINKER - Powered by LK SOFTWARE
AITechnology

A.I. Data Shortage Looms as Websites Clamp Down

Highlights

  • A.I. models need vast data, but web restrictions are limiting content.

  • Companies invest millions in publisher partnerships for data access.

  • Alternative solutions include synthetic data and transcription methods.

Samantha Reed
Last updated: 19 July, 2024 - 8:27 pm 8:27 pm
Samantha Reed 10 months ago
Share
SHARE

Artificial Intelligence (A.I.) models require vast quantities of data for training, yet an increasing number of websites are restricting the use of their digital content. This emerging issue has been highlighted by the Data Provenance Initiative, a research group from MIT, indicating a potential data scarcity for both commercial and academic A.I. institutions. The tension between data needs and content restrictions could have significant implications for the future of A.I. development.

Contents
Web Restrictions Impact A.I. TrainingCommercial Efforts to Acquire Data

Web Restrictions Impact A.I. Training

A recent study shows a 5 percent reduction in overall data and a 25 percent cut from high-quality sources due to website restrictions. This analysis examined 14,000 web domains, impacting major datasets like C4, RefinedWeb, and Dolma. Automated bots, or web crawlers, used by companies such as OpenAI, Google, and Meta, are increasingly blocked from accessing content, with OpenAI’s crawlers facing the most significant challenges, restricted from about 26 percent of high-quality data sources.

Commercial Efforts to Acquire Data

In response to the data shortage, A.I. companies are investing millions in partnerships with publishers to secure content archives. OpenAI has reportedly offered between $1 million to $5 million to access archives from The Atlantic, Vox Media, and others. Additionally, methods to transcribe video and audio content using tools like Whisper are being explored to bypass text restrictions.

Synthetic data is emerging as another solution, where A.I. generates data instead of sourcing it from humans. OpenAI’s CEO, Sam Altman, supports this approach, suggesting that once models can produce high-quality synthetic data, it may alleviate the pressure on conventional data sources. However, some experts argue that fears of a data crisis are exaggerated, noting untapped resources in sectors like healthcare and education.

Historically, concerns about data limitations for A.I. have been discussed, but previous measures focused more on gathering diverse and vast datasets rather than facing restrictions. Earlier reports emphasized the growth of data collection technologies and the expansion of available training datasets. The shift from abundance to scarcity marks a significant change in the A.I. data landscape.

Previous strategies included enhancing web crawlers and improving data processing algorithms to maximize the quality and quantity of data collected. Current challenges signify a need to adapt to new restrictions and find innovative methods to continue advancing A.I. technologies without compromising ethical standards or legal boundaries.

The tightening availability of web-based data poses hurdles for A.I. development, pushing companies to seek alternative solutions such as partnerships, transcriptions, and synthetic data generation. The debate on the severity of the data shortage continues, with some industry experts confident in the untapped potential of other data sources. The future of A.I. may hinge on balancing these strategies with ongoing ethical and legal considerations.

You can follow us on Youtube, Telegram, Facebook, Linkedin, Twitter ( X ), Mastodon and Bluesky

You Might Also Like

US Stops AI Rule, Tightens Chip Export Measures

Elon Musk Expands Starlink in Saudi Arabia for Maritime and Aviation

AI Reshapes Global Workforce Dynamics

Trump Alters AI Chip Export Strategy, Reversing Biden Controls

CrowdStrike Faces Workforce Reduction Amid Financial Shifts

Share This Article
Facebook Twitter Copy Link Print
Samantha Reed
By Samantha Reed
Samantha Reed is a 40-year-old, New York-based technology and popular science editor with a degree in journalism. After beginning her career at various media outlets, her passion and area of expertise led her to a significant position at Newslinker. Specializing in tracking the latest developments in the world of technology and science, Samantha excels at presenting complex subjects in a clear and understandable manner to her readers. Through her work at Newslinker, she enlightens a knowledge-thirsty audience, highlighting the role of technology and science in our lives.
Previous Article CrowdStrike Update Causes Major Disruption in Government Services
Next Article Mistral AI and NVIDIA Introduce Powerful NeMo Model

Stay Connected

6.2kLike
8kFollow
2.3kSubscribe
1.7kFollow

Latest News

Tesla VP Shares Insight Into Stunning Robot Dance
Electric Vehicle
Tesla Cybertrucks Join Trump’s Motorcade in Qatar
Electric Vehicle
Upcoming NVIDIA RTX 5060 Pricing Leaked Ahead of Launch
Computing
MITRE’s CVE Program Faces Funding Shake-up and Future Alternatives
Cybersecurity
Tesla Hires Operators to Develop Optimus Robot
Electric Vehicle
NEWSLINKER – your premier source for the latest updates in ai, robotics, electric vehicle, gaming, and technology. We are dedicated to bringing you the most accurate, timely, and engaging content from across these dynamic industries. Join us on our journey of discovery and stay informed in this ever-evolving digital age.

ARTIFICAL INTELLIGENCE

  • Can Artificial Intelligence Achieve Consciousness?
  • What is Artificial Intelligence (AI)?
  • How does Artificial Intelligence Work?
  • Will AI Take Over the World?
  • What Is OpenAI?
  • What is Artifical General Intelligence?

ELECTRIC VEHICLE

  • What is Electric Vehicle in Simple Words?
  • How do Electric Cars Work?
  • What is the Advantage and Disadvantage of Electric Cars?
  • Is Electric Car the Future?

RESEARCH

  • Robotics Market Research & Report
  • Everything you need to know about IoT
  • What Is Wearable Technology?
  • What is FANUC Robotics?
  • What is Anthropic AI?
Technology NewsTechnology News
Follow US
About Us   -  Cookie Policy   -   Contact

© 2025 NEWSLINKER. Powered by LK SOFTWARE
Welcome Back!

Sign in to your account

Register Lost your password?