Technology NewsTechnology NewsTechnology News
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Reading: Study Reveals OpenAI’s GPT-4o Trained on Copyrighted Data
Share
Font ResizerAa
Technology NewsTechnology News
Font ResizerAa
Search
  • Computing
  • AI
  • Robotics
  • Cybersecurity
  • Electric Vehicle
  • Wearables
  • Gaming
  • Space
Follow US
  • Cookie Policy (EU)
  • Contact
  • About
© 2025 NEWSLINKER - Powered by LK SOFTWARE
AI

Study Reveals OpenAI’s GPT-4o Trained on Copyrighted Data

Highlights

  • OpenAI's GPT-4o likely used copyrighted O'Reilly books.

  • Study highlights need for greater AI data transparency.

  • Implications stress ethical data sourcing in AI development.

Samantha Reed
Last updated: 2 April, 2025 - 12:09 pm 12:09 pm
Samantha Reed 2 months ago
Share
SHARE

A recent investigation by the AI Disclosures Project has uncovered that OpenAI’s GPT-4o model likely utilized copyrighted materials from O’Reilly Media without proper authorization. This revelation raises significant concerns about data sourcing practices in the development of advanced language models. The study highlights potential legal and ethical ramifications for AI developers and content creators alike.

Contents
How Did the Study Determine Data Usage?What Were the Key Findings?What Are the Implications for AI Companies?

The research builds on previous examinations of data usage by AI companies, providing concrete evidence of unauthorized training data. Unlike earlier models, GPT-4o demonstrates a higher ability to recognize and replicate proprietary content, which underscores the increasing sophistication of AI systems in handling restricted information. This development prompts a reevaluation of existing data acquisition protocols in the AI industry.

How Did the Study Determine Data Usage?

Researchers employed a legally-obtained dataset comprising 34 copyrighted O’Reilly Media books to test if GPT-4o could distinguish between original and paraphrased texts. Utilizing the DE-COP membership inference attack method, the study assessed the model’s ability to recognize specific content, revealing a significant level of data awareness.

What Were the Key Findings?

The study found that GPT-4o achieved an AUROC score of 82% in recognizing paywalled O’Reilly content, substantially higher than the GPT-3.5 Turbo model, which scored just above 50%. Additionally, GPT-4o showed better recognition of non-public materials compared to publicly accessible samples, indicating a deeper engagement with restricted data sources.

What Are the Implications for AI Companies?

“AI companies must prioritize transparency in their data acquisition processes to ensure ethical standards are upheld,”

the AI Disclosures Project emphasized. Unauthorized use of copyrighted data could lead to legal challenges and diminish trust in AI technologies. The study advocates for stronger accountability measures and enhanced disclosure practices to safeguard intellectual property rights.

While previous reports hinted at similar issues, this study provides empirical evidence specifically linking OpenAI’s GPT-4o with the unauthorized use of O’Reilly Media’s content. The findings suggest a broader, systemic issue within the AI sector regarding the sourcing of training data, necessitating comprehensive regulatory frameworks to address these challenges effectively.

Robust data licensing agreements and transparent training methodologies are essential for maintaining the integrity of AI development. Implementing the EU AI Act’s disclosure requirements could significantly improve accountability, ensuring that content creators are fairly compensated and informed about the use of their work in training models.

Efficiently navigating the balance between technological advancement and ethical data use will be crucial for the sustainable growth of AI. Companies must adopt responsible practices to foster innovation while respecting intellectual property rights, ultimately contributing to a more equitable digital ecosystem.

You can follow us on Youtube, Telegram, Facebook, Linkedin, Twitter ( X ), Mastodon and Bluesky

You Might Also Like

Middle East Boosts Tech Industry with Global Investments

OpenAI Acquires Jony Ive’s Startup for AI-Focused Hardware

Nvidia Expands A.I. Ambitions with Major Computex Announcements

Linux Foundation and Meta Drive Open-Source AI Adoption

AI Speeds Spark Security Concerns for Businesses

Share This Article
Facebook Twitter Copy Link Print
Samantha Reed
By Samantha Reed
Samantha Reed is a 40-year-old, New York-based technology and popular science editor with a degree in journalism. After beginning her career at various media outlets, her passion and area of expertise led her to a significant position at Newslinker. Specializing in tracking the latest developments in the world of technology and science, Samantha excels at presenting complex subjects in a clear and understandable manner to her readers. Through her work at Newslinker, she enlightens a knowledge-thirsty audience, highlighting the role of technology and science in our lives.
Previous Article Wireless Logic Acquires Brazil’s Arqia to Boost IoT
Next Article Apple Launches watchOS 11.4 Featuring Enhanced Sleep Alarm

Stay Connected

6.2kLike
8kFollow
2.3kSubscribe
1.7kFollow

Latest News

Computex 2025 unveils cutting-edge graphics cards
Computing
Master Wordle and Solve Puzzles with Strategic Tips
Gaming
Rainbow Robotics Boosts RB-Y1 with New Upgrades
Robotics
Court Denies Khashoggi Widow’s Lawsuit Against NSO Group
Technology
Detroit’s Automate 2025 Showcases Robotics Growth and Innovations
Robotics
NEWSLINKER – your premier source for the latest updates in ai, robotics, electric vehicle, gaming, and technology. We are dedicated to bringing you the most accurate, timely, and engaging content from across these dynamic industries. Join us on our journey of discovery and stay informed in this ever-evolving digital age.

ARTIFICAL INTELLIGENCE

  • Can Artificial Intelligence Achieve Consciousness?
  • What is Artificial Intelligence (AI)?
  • How does Artificial Intelligence Work?
  • Will AI Take Over the World?
  • What Is OpenAI?
  • What is Artifical General Intelligence?

ELECTRIC VEHICLE

  • What is Electric Vehicle in Simple Words?
  • How do Electric Cars Work?
  • What is the Advantage and Disadvantage of Electric Cars?
  • Is Electric Car the Future?

RESEARCH

  • Robotics Market Research & Report
  • Everything you need to know about IoT
  • What Is Wearable Technology?
  • What is FANUC Robotics?
  • What is Anthropic AI?
Technology NewsTechnology News
Follow US
About Us   -  Cookie Policy   -   Contact

© 2025 NEWSLINKER. Powered by LK SOFTWARE
Welcome Back!

Sign in to your account

Register Lost your password?