Publishers Fight for Protection from AI Data Mining
By Sarah Frideswide
Academic publishing finds itself at a critical and challenging juncture. There are increased calls for open access across the industry, but at the same time, there is a threat to authors’ and publishers’ intellectual property from AI, in an environment where tech companies are becoming less than forthcoming about the sources used for training their AI. Human plagiarism is taken very seriously in the academic world and rightly so – stealing the contents of others’ minds, labour and skilled research is a crime. However, the same legal rules don’t currently apply to AI. Largely because governments across the world are playing catch up with technology which is outpacing political development and ethical understanding.
As yet, there are no lawsuits from academic publishers seeking to protect their intellectual property from AI data mining. But there have been calls for greater protection and transparency over the way AI is trained. There are concerns that intellectual misconduct is growing more widespread and publishers such as Taylor and Francis have expressed a need for improvements in technology that can detect papers generated by AI. There has also been a move by the European Parliament to bring in greater protections against data mining, with a declaration released in December 2022 which sets out a way forward for “digital transition” and includes a section on ensuring transparency in the use of AI. This declaration is a statement of intent rather than an active plan and it isn’t reflected in legislation, although its publication suggests that it soon may be. It certainly shows that the need for transparency is on the EU’s radar.
Some academic journals, including Nature, have published terms on which AI tools can be used or credited in their publications in a bid to tackle transparency and fairness. Nature says; “First, no LLM tool will be accepted as a credited author on a research paper. That is because any attribution of authorship carries with it accountability for the work, and AI tools cannot take such responsibility.
Second, researchers using LLM tools should document this use in the methods or acknowledgements sections. If a paper does not include these sections, the introduction or another appropriate section can be used to document the use of the LLM.” A number of academic publishers have AI policies of their own, including Taylor and Francis, Cambridge University Press, Edinburgh University Press.
However, commitments like this are down to the individual publishers and journals and, in the UK at least, there is no framework in place for them to adhere to as yet. That means that by the time protection against AI data mining becomes law, the forest of who has trained which AI tools to do what and with whose source material is likely to be a very thick jungle indeed, not least because AI is developing at a rate previously unseen in history.
Not all publishers are defensive about the use of AI, however. Some are embracing it. Pearson are in the process of developing a range of AI tools to help produce material, despite the fact that their profits fell when a rival of theirs, Chegg in the US, reported damage to its business from AI. Some publishers, such as Routledge, are beginning to publish books and research around AI and society, including the ethical dimension.
Meanwhile, the Publishers’ Association has launched an AI taskforce to help the industry navigate these complex waters. While over in the US a publishers’ consortium, which includes members such as the New York Times and The Washington Post, has produced guidelines looking into the fair use of copyrighted materials in AI generated content. It has come to the conclusion that “most of the use of publishers’ original content by AI systems for both training and output purposes would likely be found to go far beyond the scope of fair use as set forth in the Copyright Act and established case law. Marketing Brew.”
That being the case, it will not be long before governments have to address the increasing need for copyright protection against AI data mining. The UK government website has a page dedicated to a detailed consultation on the subject of AI and intellectual property which was conducted in 2022. However, this is mainly focussed on patents and technology, with only a short section for literary and artistic works. It doesn’t address the issue of AI data mining, presumably because that issue has only just begun to surface. And as for false attribution, i.e. where a person falsely claims that a work produced by AI was produced by them, it simply says; “We do not think that false attribution is a substantial issue at present.” Mid-way through 2022, that consultation may have been sufficiently detailed to cover the issues then posed by AI. Now that we are in the second half of 2023, it no longer does.