Technology NewsTechnology NewsTechnology News
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Reading: Study Reveals OpenAI’s GPT-4o Trained on Copyrighted Data
Share
Font ResizerAa
Technology NewsTechnology News
Font ResizerAa
Search
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Follow US
  • Cookie Policy (EU)
  • Contact
  • About
© 2025 NEWSLINKER - Powered by LK SOFTWARE
AI

Study Reveals OpenAI’s GPT-4o Trained on Copyrighted Data

Highlights

  • OpenAI's GPT-4o likely used copyrighted O'Reilly books.

  • Study highlights need for greater AI data transparency.

  • Implications stress ethical data sourcing in AI development.

Samantha Reed
Last updated: 2 April, 2025 - 12:09 pm 12:09 pm
Samantha Reed 10 months ago
Share
SHARE

A recent investigation by the AI Disclosures Project has uncovered that OpenAI’s GPT-4o model likely utilized copyrighted materials from O’Reilly Media without proper authorization. This revelation raises significant concerns about data sourcing practices in the development of advanced language models. The study highlights potential legal and ethical ramifications for AI developers and content creators alike.

Contents
How Did the Study Determine Data Usage?What Were the Key Findings?What Are the Implications for AI Companies?

The research builds on previous examinations of data usage by AI companies, providing concrete evidence of unauthorized training data. Unlike earlier models, GPT-4o demonstrates a higher ability to recognize and replicate proprietary content, which underscores the increasing sophistication of AI systems in handling restricted information. This development prompts a reevaluation of existing data acquisition protocols in the AI industry.

How Did the Study Determine Data Usage?

Researchers employed a legally-obtained dataset comprising 34 copyrighted O’Reilly Media books to test if GPT-4o could distinguish between original and paraphrased texts. Utilizing the DE-COP membership inference attack method, the study assessed the model’s ability to recognize specific content, revealing a significant level of data awareness.

What Were the Key Findings?

The study found that GPT-4o achieved an AUROC score of 82% in recognizing paywalled O’Reilly content, substantially higher than the GPT-3.5 Turbo model, which scored just above 50%. Additionally, GPT-4o showed better recognition of non-public materials compared to publicly accessible samples, indicating a deeper engagement with restricted data sources.

What Are the Implications for AI Companies?

“AI companies must prioritize transparency in their data acquisition processes to ensure ethical standards are upheld,”

the AI Disclosures Project emphasized. Unauthorized use of copyrighted data could lead to legal challenges and diminish trust in AI technologies. The study advocates for stronger accountability measures and enhanced disclosure practices to safeguard intellectual property rights.

While previous reports hinted at similar issues, this study provides empirical evidence specifically linking OpenAI’s GPT-4o with the unauthorized use of O’Reilly Media’s content. The findings suggest a broader, systemic issue within the AI sector regarding the sourcing of training data, necessitating comprehensive regulatory frameworks to address these challenges effectively.

Robust data licensing agreements and transparent training methodologies are essential for maintaining the integrity of AI development. Implementing the EU AI Act’s disclosure requirements could significantly improve accountability, ensuring that content creators are fairly compensated and informed about the use of their work in training models.

Efficiently navigating the balance between technological advancement and ethical data use will be crucial for the sustainable growth of AI. Companies must adopt responsible practices to foster innovation while respecting intellectual property rights, ultimately contributing to a more equitable digital ecosystem.

You can follow us on Youtube, Telegram, Facebook, Linkedin, Twitter ( X ), Mastodon and Bluesky

You Might Also Like

Masumi Network Bridges AI Agents and Blockchain for Secure Collaboration

Anthropic Ramps Up Claude Safeguards to Counter Bioweapon Threats

Swedish Startup Dentio Secures Funding to Automate Dental Administration

Vention Secures $110M to Advance Physical AI in Manufacturing

Synthesia Raises $200 Million, Pushes AI Avatars into Enterprise

Share This Article
Facebook Twitter Copy Link Print
Samantha Reed
By Samantha Reed
Samantha Reed is a 40-year-old, New York-based technology and popular science editor with a degree in journalism. After beginning her career at various media outlets, her passion and area of expertise led her to a significant position at Newslinker. Specializing in tracking the latest developments in the world of technology and science, Samantha excels at presenting complex subjects in a clear and understandable manner to her readers. Through her work at Newslinker, she enlightens a knowledge-thirsty audience, highlighting the role of technology and science in our lives.
Previous Article Wireless Logic Acquires Brazil’s Arqia to Boost IoT
Next Article Apple Launches watchOS 11.4 Featuring Enhanced Sleep Alarm

Stay Connected

6.2kLike
8kFollow
2.3kSubscribe
1.7kFollow

Latest News

Graphics Card Maker Warns Shoppers As RAM Supply Worries Grow
Computing
Vodafone and Skylo Extend IoT Coverage Worldwide Through Hybrid Satellite Connectivity
IoT
Cybercriminals and State Groups Target WinRAR Vulnerability, Google Confirms
Cybersecurity
Tesla Shareholders Focus on Robotaxi, SpaceX IPO, and Optimus Robot Developments
Electric Vehicle
Tesla and Pilot Boost Semi Charging Stations for Truck Fleets
Electric Vehicle
NEWSLINKER – your premier source for the latest updates in ai, robotics, electric vehicle, gaming, and technology. We are dedicated to bringing you the most accurate, timely, and engaging content from across these dynamic industries. Join us on our journey of discovery and stay informed in this ever-evolving digital age.

ARTIFICAL INTELLIGENCE

  • Can Artificial Intelligence Achieve Consciousness?
  • What is Artificial Intelligence (AI)?
  • How does Artificial Intelligence Work?
  • Will AI Take Over the World?
  • What Is OpenAI?
  • What is Artifical General Intelligence?

ELECTRIC VEHICLE

  • What is Electric Vehicle in Simple Words?
  • How do Electric Cars Work?
  • What is the Advantage and Disadvantage of Electric Cars?
  • Is Electric Car the Future?

RESEARCH

  • Robotics Market Research & Report
  • Everything you need to know about IoT
  • What Is Wearable Technology?
  • What is FANUC Robotics?
  • What is Anthropic AI?
Technology NewsTechnology News
Follow US
About Us   -  Cookie Policy   -   Contact

© 2025 NEWSLINKER. Powered by LK SOFTWARE
Welcome Back!

Sign in to your account

Register Lost your password?