Meta has illegally used millions of books and scientific papers to train its AI models. We explain how to check if your work has been pirated, and what steps you can take if it has.
In January, court documents revealed that Meta used Library Genesis (LibGen) – a well-known repository of pirated books and academic papers that originated in Russia – to train its generative AI language model. According to the court filing, Meta CEO Mark Zuckerberg approved the company’s use of LibGen, often referred to as a “shadow library”.
The Atlantic has published a searchable database (behind a paywalll) that allows you to check whether your intellectual work was illegally used to train Meta’s AI model.
What can you do if you find your scientific publication in the library Meta used to train its AI?
Several organisations representing writers, translators, and illustrators – including Sanasto, the Finnish literary copyright society – recommend the following actions:
- Document Your Work: If you discover your books or researcher papers in the LibGeb, take screenshots for documentation purposes
- Send a Formal Notice to Meta if your work is in the pirated dataset. The Authors Guild provides a helpful tempate.
- Inform Your Publisher: Ask to include a “NO AI TRAINING” notice on your copyright page of your scientific work. For online work you can update your website’s robots.txt to block AI bots.
- Add Your Voice: Join other creators by signing the Statement on AI Training to oppose the unlicensed use of creative work.
Image: Adobe Stock.