UK government report: The right to read should be the right to mine

AI development requires permissive TDM rules

Last month the British government published an independent report on Growing the artificial intelligence industry in the UK. The review, conducted by Professor Dame Wendy Hall and Jérôme Pesenti, discusses the potential for how artificial intelligence (AI) “can bring major social and economic benefits to the UK,” highlighting that AI could contribute an additional £630bn to the UK economy by 2035.

The report makes several recommendations that could be explored to support the continued development and adoption of AI in the UK, including improving access to data, training experts, and increasing demand for AI applications. Of particular interest to us are two specific recommendations:

“To improve the availability of data for developing AI systems, Government should ensure that public funding for research explicitly ensures publication of underlying data in machine-readable formats with clear rights information, and open wherever possible.


“To support text and data mining as a standard and essential tool for research, the UK should move towards establishing by default that for published research the right to read is also the right to mine data, where that does not result in products that substitute for the original works. Government should include potential uses of data for AI when assessing how to support for text and data mining.

It is clearly beneficial that governments require that the outputs of publicly funded research and data be made widely available in open technical formats that are consumable by computers. If the data is not made available in machine-readable formats, it will be impossible to efficiently conduct text and data mining across a large corpus of works. It’s also good that the report recommends that the UK push for an environment where “the right to read is the right to mine”—meaning that legal access to the underlying text or data should be sufficient for the user to conduct any further research techniques (such as TDM) and that no additional legal permissions or licenses should be  required in order to do so.

Text and Data Mining should not be dependent on (open) licenses.

But even though the recommendations mentioned above are on the whole encouraging, we should consider a few details that could make them even stronger in order to support a permissive legal environment with regard to artificial intelligence applications. For example, the first recommendation advocates that “clear rights information, and open wherever possible” should be attached to publicly funded research and data. This point is reasonable enough: if public sector bodies want to maximize the impact of the research they fund, it is wise to require that clear rights statements (such as permissive Creative Commons licenses or public domain dedications) are appended to these works.

This way, other scientists, AI researchers, and anyone else knows exactly how they may legally reuse the work for purposes related to artificial intelligence research. At the same time, when we view this recommendation in light of the following one, might they be somewhat at odds with each other?

Let me explain. The second recommendation calls for a liberal legal environment where no additional permissions should be required in order to use a work for research techniques related to artificial intelligence. By arguing that the right to read is the right to mine, a researcher wouldn’t need the underlying text or dataset to be made available under an open license, because by definition they would be granted those rights above and beyond whatever a CC (or similar) license says, typically through the adoption of a permissive limitation or exception to copyright.

This is exactly what TDM advocates are pushing in the current review of the EU copyright rules. In the proposal for a Directive on Copyright in the Digital Single Market, the Commission proposed a TDM exception would be available only to research organisations that operate on a not-for-profit basis or pursuant to a public interest mission as recognised by a Member State. The practical effect of this limitation means that the private sector will be excluded from the benefits of the exception. [Sidenote: this is essentially similar to the existing situation in the UK, where the national-level copyright exception for TDM only applies for noncommercial use].

EU copyright reform proposal would limit Text and Data Mining in the EU

Second, the Commission limited the purposes for which the TDM exception would apply. Their original proposal limited the scope of the TDM activity to “purposes of scientific research.” We noted that this constraint would decrease the potential impact of novel TDM uses, such as for journalism-related investigations, market research, or other types of activities not strictly considered “scientific research”.

We recommended that the Directive should be amended to ensure that they achieve the goal of facilitating research and innovation across all parts of society by permitting anyone to engage in text and data mining. This means removing the limitation on research organisations as the sole beneficiaries of the proposed exception. We also urged that the exception should allow text and data mining for any purpose. This means removing the limitation on scientific research as the only purpose allowed for under the proposed exception.

So where does this leave us? We agree with the report that publicly funded research and data should be shared as open data under permissive open licenses (such as CC BY, or even put into the worldwide public domain using a tool like the CC0 Public Domain Dedication). The public sector should do this not because it is legally required in order to conduct text and data mining or other techniques related to artificial intelligence, but more generally in order to ensure an open, communicative, and generative environment where the public gets the access they deserve and need in order to be informed on current scientific research, learn about promising medical innovations, and collaborate to solve problems. And at the same time, we need to continue to advocate for a permissive legal system that protects and expands fundamental user rights, such as a broad copyright exception for text and data mining that applies to any user, for any purpose.

The UK independent report is a step in the right direction because it surfaces important issues and recommendations that could foster a sensible yet progressive environment for artificial intelligence research. But as explained above, there are some details that should be worked out in order to truly support a legal environment that, from a copyright perspective, best enables these interesting and innovative research methods and technologies.  

