Consent or Pay: AI Scraping of Public Content Demands a New Licensing Model

Playback speed

Share post at current time

Share from 0:00

0:00

Generate transcript

A transcript unlocks clips, previews, and editing.

Consent or Pay: AI Scraping of Public Content Demands a New Licensing Model

AI systems are scraping and profiting from creators' original works without consent or compensation. Let’s create a plan together with iron-clad licensing terms and demand what's yours! EP #44

Declan Dunn

May 17, 2024

How do you tell a longtime friend that his book, the one he spent so long on…

AI just copied it without your permission!

That gut-punch feeling sucks, right? And it’s not just the OpenAI’s of the world, small businesses are scraping, everyone is scraping.

You'll hear mind-blowing stories from painters, photographers, and other creatives whose art is scraped and used to train AI systems - no consent, no compensation. It's like the tech giants are just helping themselves!

No venting here. You'll get a clear breakdown of what this scraping thing is really about.

No confusing tech-speak, just the truth about how YOUR public posts and artwork are basically free game for AI unless you take action.

And that's where it gets good. We're sharing a first-of-its-kind Creator Licensing Framework - a plan to regain control and call the shots on how your original work gets used by AI.

Want to say "no"? Cool.

Want to negotiate fees or royalties? Even better!

The age of just handing over your art for free AI training is over. By the end of this episode, you'll be armed and ready to consent...or get paid!

When I prompted ChatGPT to share an except from a copyrighted book, the quick response took my breath away.

A book entrusted to me many years ago, under strict copyright rules on his web page since it began.

Maybe that’s isolated, who else do I know? I reached out to the artist Geoffrey Laurence, who felt the gut punch of 50 years of painting taken from him without permission, notification, nothing.

It’s nice to talk about ethics, AI needs to be consistent in business as well as data.

Often we say we have nothing to hide, but this AI world of public spaces is crazy; now anything you post on social or a web site or anywhere their bots go, is part of AI.

And we are so not alone:

Kelly “McKernan discovered that Midjourney users included their name more than twelve thousand times in public prompts. The resulting images—of owls, cyborgs, gothic funeral scenes, and alien motorcycles—reminiscent, some near-exact copies of McKernan’s works.” Without their permission. https://www.newyorker.com/culture/infinite-scroll/is-ai-art-stealing-from-artists
Stock photographer Devin Allen’s Baltimore photography series "A Beautiful Ghetto" were being used to train AI image generators like Stable Diffusion without his knowledge or consent. https://www.haymarketbooks.org/books/1063-a-beautiful-ghetto
Concept artist Karla Ortiz found that her character designs and illustrations were being used as inputs for AI image generators, leading to the creation of derivative works without her permission or compensation. https://www.karlaortizart.com/
In episode 44 of The AI Optimist,

Consent or Pay: AI Scraping of Public Content Demands a New Licensing Model.

Where We Are: Scraping by anyone, everything Internet and social is public, basically, domain.

These are considered public spaces, in my mind it’s like implied consent. Being challenged because it’s vague, much easier than actually building this the right way, with permission.

Their jobs aren’t being taken but their work is…

Either I give you permission or you compensate me. This is inspired by Meta’s request this week to the EU about GDPR rules that might turn off all behavioral, ie AI, tracking which would decimate ad revenues.

In the negotiations, Meta proposed a Consent or Pay approach; either the EU users allow all tracking and ads, or they pay for the privilege of not being tracked and not seeing ads.

That’s negative, with a viable business angle but way too BINARY. We’ll see what the EU decides.

Let’s see what binary Consent or Pay looks like from a Creator’s perspective, one who has had their art taken without contact, consent, or communication.

That’s messed up. The solution starts with understanding a little of what’s happening.

What is AI scraping?

AI systems now scrape and repurpose this public content, without compensation and the creators' consent, generating private profits from public expression.

And there’s no law against that, except maybe the GDPR in Europe. Maybe.

Because this scraping doesn’t take the content into a database, it analyzes and distills it into tokens (and much more, let’s keep it light on the how).

Possession is 9/10ths of the law, the old cliché goes. And since it’s public, this is “fair use”.

Introduction to AI scraping by large language models (LLMs)
- Web crawlers browse and download content from websites, social media, and other online sources.
- Popular datasets like Common Crawl, contain data from billions of web pages, are often used for training LLMs (one early dataset was The Pile, and seeing that you’ll know why content is not always the greatest).
The rise of smaller businesses and individuals scraping content, even from those who try to block them, is allowed because it’s a public space.
- Scraping content with captchas or blocking measures in place

A company called Bright Data works with many big companies scraping. Their services is able to outwit software blockers and captchas.

But doesn’t that make a public site, private in a way? Apparently not, both of these businesses are successful and open.

And legit, they aren’t do anything more than taking advantage of an opening that no one is really talking about.

Since there’s no regulations on public spaces, it’s open season.

Web Scraping 101: A Million Dollar Project Idea (youtube.com)

Consent or pay:

Is publicly available content fair game – public domain?

Will AI be allowed to scrape and monetize publicly available content without the creators' consent, or will there be a licensing model that protects the creators' rights and ensures fair compensation?

As AI advances, the call for a "consent or pay" model grows louder, aiming compensate creators whose work fuels the digital economy for their work, their livelihood.

II. Licensing Framework to Protect Content Creators' Rights

Here are the elements to consider if you are a creator, copyright holder, artist, business, or individual whose work is on the public Internet and social media.

An AI licensing agreement protects their intellectual property rights and provides fair compensation.

Scope of License

The agreement clearly defines the scope granted to the AI company, like:

- Which specific works (songs, lyrics, recordings, etc.) are covered

- The duration of the license (perpetual or time-limited)

- Permitted uses (training AI models, generating derivative works, commercial sales, etc.)

- Exclusions or limitations on the use of the artist's work

Compensation and Royalties Options

The agreement outlines fair compensation for the artist, including:

- Upfront licensing fees based on the value and extent of the artist's work used;

- Ongoing royalties from AI-generated works, and training data, derived from the artist's original material

- Revenue sharing from products/services utilizing the AI models trained on the artist's work.

Artist Credit

To protect the artist's moral rights and reputation:

- Require clear attribution whenever the artist's work is used or referenced by the AI system

- Prohibit the AI company from using the artist's name/likeness in a defamatory or derogatory manner (like a bad prompt for nefarious purposes).

- Grant the artist approval rights over AI-generated derivative works bearing their name.

Data Privacy and Security

The agreement should address data privacy and security concerns, such as:

- Restrictions on sharing or selling the artist's data to third parties

- Protect the artist's data from unauthorized access or misuse

- Provisions for data deletion or return upon termination of the agreement

5. Audit and Transparency

To ensure transparency and compliance, the agreement might include:

- Audit rights for the artist to review the AI company's use of their work

- Reporting requirements on the AI company's use and commercial use of the artist's data

- Provisions for third-party audits or oversight

Termination and Remedies

The agreement outlines clear termination clauses and remedies for breaches, such as:

- Grounds for ending the license (e.g., material breach, insolvency, change of control)

- Injunctive relief and damages in case of unauthorized use or infringement

- Dispute resolutions like mediation and arbitration.

SOURCES

https://aidiscjockey.com/ai-liner-notes/

https://cdn.openai.com/papers/gpt-4.pdf

CONSENT OR PAY

A Call to Action

We need:
- An AI comprehensive licensing framework
- Run by a 3^rd party, not individual Big Tech law firms
- Definitions for the role of content creators, platforms, and AI developers
Action plan for content creators
- Review and update Terms of Service to prevent scraping.
- Use OpenAI’s robots.txt like file to block them, and others. Beware of annoying customers and your audience, to protect yourself.
- Start by monitoring and licensing your content, using things like Blaze or Nightshade to protect.
- Get your voice heard with policymakers and industry leaders
- Advocate for a balanced approach to AI scraping
- Help shape the future of content creation and AI development

Protect Your Work From Scraping

There are a few tools and techniques that artists can use to protect their work from being scraped and used to train AI models without permission:

1. **ArtShield Watermarker**: This tool applies an invisible robust watermark to images, helping to camouflage them against AI scrapers and prevent unauthorized training on the artwork.[1]

2. **Kin.art Portfolio Hosting: This platform uses techniques like label fuzzing and image segmentation to disrupt how AI understands and can scrape images, essentially making artists' portfolios invisible to AI scrapers.

3. **Kudurru: A tool that identifies web scraping as it happens and can "poison" the scraped data by sending corrupted images, confusing the AI model being trained on that data.

4. **Nightshade: This tool allows artists to subtly shift the pixels in their artwork, causing AI image generators to misinterpret and "confuse" the subject when attempting to recreate the art.

5. **Glaze: Similar to Nightshade, Glaze adds an invisible "cloak" or watermark to images that makes AI perceive the artwork differently than humans, preventing accurate training on the artist's style.

These tools aim to either camouflage artwork from being scraped in the first place or corrupt the data if it is scraped, forcing AI companies to obtain proper consent and licensing from artists before using their work for training models.

SOURCES

https://www.itsnicethat.com/news/kin-art-ai-safe-portfolio-creative-industry-230124

https://www.wired.com/story/kudurru-ai-scraping-block-poisoning-spawning/

https://hyperallergic.com/853520/nightshade-helps-artists-protect-their-work-from-ai-scraping/

https://www.marketplace.org/2024/02/05/ai-models-art-poison-pill/

While direct compensation has been limited so far, these cases have raised significant awareness and public pressure on AI companies to address intellectual property and consent issues.

Some companies began negotiations with artists over licensing fees, but no comprehensive solution has emerged.

Consent or Pay: AI Scraping of Public Content Demands a New Licensing Model

No venting here. You'll get a clear breakdown of what this scraping thing is really about.

It’s nice to talk about ethics, AI needs to be consistent in business as well as data.

In episode 44 of The AI Optimist,

Consent or Pay: AI Scraping of Public Content Demands a New Licensing Model.

Where We Are: Scraping by anyone, everything Internet and social is public, basically, domain.

What is AI scraping?

A company called Bright Data works with many big companies scraping. Their services is able to outwit software blockers and captchas.

Consent or pay:

Is publicly available content fair game – public domain?

II. Licensing Framework to Protect Content Creators' Rights

CONSENT OR PAY

Protect Your Work From Scraping

Let’s invent one together.

When someone entrusts me with their book of survival, it’s personal.

We can do better.

Discussion about this video

Ready for more?