Five years ago this Valentine's Day, OpenAI releases GPT-2; headlines later warn it’s 'too dangerous to release.'
Their concern? That it could generate synthetic content at scale.
Fast forward to 2025, and we're seeing the first major AI fair use victory in court - Thomson Reuters just won an early judgement protecting their platform's content from training a competitor’s AI, something courts rarely find copyrightable.
The other side bought the data from a third party and used it to train its AI, claiming fair use. And in this round, that argument was lost, with more details later in this article.
The fair use claim is the same one used by ChatGPT and most others; while the facts are quite different, the fair use denial is an interesting response to turning content into tokens. In this round at least.
Roulette Wheels or emerging Intelligence?
Last week we looked The U.S. Copyright Office releasing their second report stating AI-generated work isn't copyrightable because it's essentially a 'roulette wheel of probability.
While everyone's focused on AI outputs, the real Achilles heel weakness is hiding in plain sight – what do we do with content used to train AI without permission, AND not falling under fair use?
It's about the millions of novels, articles, paintings, photos, and lines of code that built these AI systems.
Next year the US Copyright office will release the 3rd update, this one on what happens when content goes into AI, from copyright law in the US. Surprising that the first 2 reports depend on this 3rd one, also showing how complex and new this issue is to all of us.
How can you issue AI a copyright when you can’t define what part of someone else’s content went into what came out?
What if they found all the data in all the AI is not fair use; it’s all copyrightable content that should have been given something – an email, a notice, something. Or not.
There's no way the Copyright Office would risk shutting down ChatGPT and Claude by extending protection now. It will be 6 years since this began when the update should come out, and the findings will be hard to use retroactively.
But what happens when they finally tackle the question of all that creative work that went into building them?
Like the Thomson Reuters winning copyright protection for it’s Content last week!
Ross, a startup, competes with Westlaw's legal research platform. And had worked there previously, so likely knew the quality of the information for training AI would be high.
They needed legal Q&As to train their AI - think of it like needing textbooks to teach a student.
When Thomson Reuters said 'no' to licensing their content, Ross got creative... maybe too creative.
They bought data from a company called LegalEase to get something called 'Bulk Memos' - basically, lawyers' compilations of legal questions with good and bad answers.
These lawyers were using Westlaw's headnotes (like Cliff Notes for legal cases) to create these answers. Ross ended up with 25,000 of these memos to train their AI.
This is eerily like how OpenAI, Anthropic, and others acquired their training data. They purchased crawled data from Common Crawl - think of it as a digital library that copied everything on the internet.
Like Ross tried to sidestep Thomson Reuters by getting content through a third party, AI companies have been using third-party scraped data without creator permission.
Businesses like Common Crawl and even WayBackMachine.org have been collecting this data, but both are non profits. Like OpenAI…wait, OpenAI is going away from that now. Back then?
It’s never been a big deal until AI companies make billions with that data, at a scale way beyond anything before.
The scale is astronomical. We're not talking about 25,000 memos - we're talking about trillions of data points from millions of creators.
While trillion-dollar tech companies claim they can't pay for content, they're quietly making deals with publishers who have the legal and financial muscle to fight back.
The New York Times, Axel Springer, even Thomson Reuters - they're all getting seats at the table.
But what about the millions of independent creators whose work built these AI systems?
And that brings us back to 1787, when our founding fathers picked up their quill pens...
To know where copyright is going, you need to know where it starts…in the US for our example.
The origin story of AI Copyright….really all US copyright, begins with that document and a promise to protect inventors and creators.
Smart as he was, I don’t think Thomas Jefferson could wrap his head around our questions….
Constitution Origins: From Quill Pens to Neural Networks
In 1787, the framers of the Constitution faced a fundamental challenge: protecting creativity in their new nation.
Article 1, Section 8 states: "To promote the progress of science and useful arts by securing for limited times to authors and inventors the exclusive rights for their respective writings and discoveries."
The Copyright Office maintains that purely AI-created works don't promote human progress because they lack human creative input.
But in an era where AI systems can solve complex scientific problems and create new art forms, we're redefining what 'progress' means.
"See, when the Constitution talked about respective writings and discoveries, right? It's like back in the day when they didn't have anything but writing, photography was coming to really blow their minds very soon."
The Sui Generis Framework: A New Model for AI Copyright
"Now, the reason I'm proposing the sui generis option the US Copyright office didn’t pursue, is a customized legal view of AI generated work is because right now, there's agencies and people doing work with commissions being told and instructed what to do to create images like cases we've shared with you before."
That basic service of delivering copyright protection for a work for hire, or other arrangement, is left out on its own with AI. If AI generates it, it’s not human, and it’s not copyrightable.
Sure we all bend the rules in the meantime. Until a big mistake happens. And the liability goes to the one who paid for the work, yet doesn’t have a copyright to protect themselves.
The framework needs to address four key areas:
1. Authorship and Ownership
"It's also a focus on substantial human involvement. How much? And this is case by case, but how much was this person involved in creating this or how much was the AI involved?"
2. Duration of Protection
"One thing people complain about copyrights in traditional ways is how long they are... And for AI generated work it's clear we're looking like 5 to 10 years."
3. Fair Use Evolution
"Fair use is really important... There's many different ways that people can use it and be able to use the content without taking it too. Certainly, it doesn't mean take the book. You might take a quote."
The Creator's AI Copyright Playbook
"Document any of your work with generative AI. And by the way, with anything, if you're not using AI do the same thing because there's going to be more requirements."
Strategic Documentation
"Make sure you can prove whether you did something with AI or without it, and show a record of your activity. Photos, manuals, journals, whatever. You have calendars to keep proof of what you've done."
Transform, Don't Just Generate
"When AI generates independently, in other words, I put a couple of prompts in, spits it out. I want a copyright. No, because prompts don't give copyright. Not in China, not in the US as of this time."
Building Your Evidence Trail
"Now when you transform AI output, when you take AI output and turn it into something else, that's possible to get a copyright. Assuming you're not using AI to transform AI, right?"
The last US AI Copyright update was so wrong to me, AI being just a 'roulette wheel of probability.'
And in their early update on AI Replicas, they wouldn’t touch the issue of copyright because of likeness and image – and no rules applying there. Not because it was machine generated.
Because it was not human and it’s a word calculator. A spinner of other’s tales. Bringing it all down to the lowest common denominator.
Enough….and looking around at news in the past 10 years, what you see is likely what we’ve been creating, because all that “bad” data….is also us.
It’s a moving target, not a set in stone and time. New copyright is like playing a game, old copyright is like reading a book.
Both are fun, one moves a little faster than the other.
The US Copyright office will likely have a new challenge with its findings on how copyright applies to content put into ChatGPT, and everyone else, by this time next year.
Enter the Humanity's Last Exam – HLE - a test so challenging that no single human, not even Nobel laureates, could ace it all. We're talking 3,000 questions across every field imaginable, from philosophy to quantum physics, developed by nearly 1,000 experts across 500 institutions.
In just one year, AI performance on HLE jumped from 10% to 30% accuracy.
It's not just predicting anymore, it's reasoning through problems that challenge the brightest human minds. It's like watching evolution in fast forward.
And there are many more. The GAIA (General AI Assistants) benchmark is an evaluation tool assessing the performance of AI systems, with a focus on their ability to function as general assistants.
Is AI ready for the big questions – right now it depends how you pose them, and how you train AI, now, 2 weeks from now, and every 2 weeks.
Creators listen: If AI triples its performance again this year - and at this rate, who's betting against that? - we're not talking about a fancy probability engine anymore.
We're talking about something that could redefine human knowledge itself.
Ready and perfect, no way – way more interesting than anything in the past 20 years?
Maybe….and it’s already in our lives, many simply don’t understand how…..
So while the U.S. Copyright Office wrestles with digital replicas in one report and dismisses AI as random in another, the reality is racing ahead of our legal frameworks.
That's exactly why we need your voice in this conversation.
Because the future of creative rights isn't just about law - it's about understanding what happens when machines start thinking like humans... or maybe even better.
When we start thinking with and working with AI, maybe it helps us grow as well.
Share your thoughts in the comments.
How do you think we should protect creativity in a world where AI isn't just rolling dice anymore?
RESOURCES
NOTE: Thomson Reuters owns Westlaw. When I refer to Reuters in this video, it means Thomson Reuters.
Thomson Reuters wins AI copyright 'fair use' ruling against one-time competitor
By Blake Brittain February 11, 2025
THOMSON REUTERS ROSS LAWSUIT fair use.pdf
The Automation Assumption Was Wrong—The Revolution Eats Itself
AI deemed ‘too dangerous to release’ makes it out into the world by Andrew Griffin
New York Times sues Microsoft, ChatGPT maker OpenAI over copyright infringement
US copyright office
Share this post