Meta Used Books and Research Papers to Train Its AI – Was Yours One of Them?

Meta has illegally used millions of books and scientific papers to train its AI models. We explain how to check if your work has been pirated, and what steps you can take if it has.

In January, court documents revealed that Meta used Library Genesis (LibGen) – a well-known repository of pirated books and academic papers that originated in Russia – to train its generative AI language model. According to the court filing, Meta CEO Mark Zuckerberg approved the company’s use of LibGen, often referred to as a “shadow library”.

The Atlantic has published a searchable database (behind a paywalll) that allows you to check whether your intellectual work was illegally used to train Meta’s AI model.

What can you do if you find your scientific publication in the library Meta used to train its AI?

Several organisations representing writers, translators, and illustrators – including Sanasto, the Finnish literary copyright society – recommend the following actions:

  • Document Your Work: If you discover your books or researcher papers in the LibGeb, take screenshots for documentation purposes
  • Send a Formal Notice to Meta if your work is in the pirated dataset. The Authors Guild provides a helpful tempate.

Image: Adobe Stock.