According to the Atlantic, Meta ‘pirates’ books for AI training. The FaceBook parent company allegedly dodged the ethical issues of training its AI Large Language Models by sourcing a vast amount of data from illegal sources uncovered by US news site.
Current LLM’s require vast amounts of training data to support the analysis and reproduction of such material. As it stands, no authors are compensated for the use of their material in training for-profit AI models, by Meta or anyone else.
Meta aims to fast-track it’s AI development in a fevered tech sector. Emails suggest that Meta rejected the ideal of licensing original copy-righted material as too slow, too expensive and too restrictive. There is currently no recognised licensing arrangement for use of copyrighted materials in AI training. Meta doesn’t want to be constrained, limited or delayed in developing one.
So what is Meta’s solution? According to Alex Reisner at the Atlantic, Meta simply went to the largest freely-available database of copyrighted material freely available on the Internet, downloaded the whole thing and fed it to the AI.
Taking a Liberty
The trouble is that database, LibGen, is a well-established and infamous pirate copy of some eight million books and research papers. From the Discovery phase in the US court, it seems Meta knowingly fed it’s AI with illegally sourced books and papers without permission from any of the rights holders.
Meta refuses to disclose which works sourced from LibGen found their way into Llama. Until forced to disclose by the courts, we have no way of knowing. Llama is the box containing Schrodinger’s Books; both pirated and not-pirated until someone opens it.
But this doesn’t matter, says Meta.
Meta argues Fair Use, in the US courts, a legal status that allows limited reproduction and quoting of copy-righted material for public comment, without the permission of the copy-right holder. It does not mandate reproduction of entire works, nor does it allow for Fair Use extraction to produce works in competition with the original.
And yet, with sufficient data, Meta’s Llama 3 AI model could mimic or extrapolate any written material. Llama 3 could write sequels to any novels in any series, in competition with the original author.
Meta argues that its AI produces transformative works of added value under the provisions of Fair Use. Conveniently this negates the need to license any original content.
To Be or Not to Be?
It begs the question; does a near-perfect mimicry of style and content count as transformative? We may never get an answer.
Meta, a trillion-dollar company with trillion-dollar lawyers may well slog through the courts as a cost of doing business. It hopes to outlast its opponents without reaching a definitive judgement.
As the Atlantic points out, Meta doesn’t want to license even one document. That would drive a wedge through its Fair Use defense. And also set an expensive precedent for past and future use of material.
In the meantime, the court case drags on, while AI development races ahead. The corporate monolith clearly believes it has little to fear from the swirling cloud of mosquitos trying to sting it with copyright notices. The venerable publishing industry has it’s own issues in the digital world. It is unlikely to join the costly court battle. If Big Tech offers a few crumbs in exchange for a broad license on the industry’s vast back catalog of IP, few expect the publishers to resist.
All of which leaves artists and authors with no say in what happens to their work. And no compensation for its use in a new medium; one which could potentially destroy their livelihood.
The law is a complete sham. Copyright was invented to protect the rights holders, never mind the creators. It doesn’t even to that any more.
The rights holders in publishing are the publishers, also dominated by large corporations. They only care about revenue and whatever crumbs the richer corporations throw them.
See: https://robincatling.com/everything-is-content-now
All the AI companies do it; Meta’s the first to brazen it out in court.
These giant corporations have so much money, they’ll do whatever they want and send the lawyers to stonewall any comeback.
Cynical as it is, that’s not wrong. Sadly. It seems the law serves those rich enough to wield it as a weapon.