More licenses are not the solution for text and data mining


Earlier this week  LIBER released a response to the STM Association’s statement about text and data mining (TDM). The STM Association asserts that legal certainty already exists for TDM via publishers’ licences, and that creating copyright exceptions for text and data mining activities would undermine the investment incentives for ensuring that high-quality content is available.

LIBER refutes these claims. First, they say that publishers’ licenses for TDM are not straightforward or easy to understand.

Licences could never be described as simple; they are highly complex and can take months or even years to complete. They often refer to laws in other jurisdictions and in most European countries they can override the flexibilities that exceptions are intended to provide. Many licences explicitly forbid TDM associated activities such as crawling of content and the depositing of data in institutional repositories.

Second, LIBER argues that forcing researchers to acquire licenses to engage in text and data mining will divert investment money away from conducting important research, and instead will be used to pay for license compliance and monitoring activities. Instead, they say that a copyright exception for TDM would actually promote investment, not inhibit it.

An exception for TDM can act as an investment incentive. By implementing the exception for TMD proposed by the Hargreaves review of UK copyright frameworks, the UK government has made a clear statement that legal clarity around activities such as TDM will spur innovation and growth. In the wake of the implementation of this exception tools to support TDM and improve the quality of content have already begun to emerge. Researchers in the UK have developed their own openly available tools for conversion of text files into structured standardised formats.

COMMUNIA strongly supports the notion that “the right to read is the right to mine.” We encouraged the development of clear rules for researchers who must be able to read and analyse all information that is available to them through text and data mining. We are an original signatory to the Hague Declaration on Knowledge Discovery in the Digital Age. And we criticized the development of bespoke licenses, which would create confusion and claim to grant permission to do many things that re-users do not need permission to do.

Hague Declaration calls for IP reform to support access to knowledge in the digital age

Today COMMUNIA joins over 50 organizations in releasing the Hague Declaration on Knowledge Discovery in the Digital Age. The declaration is a collaboratively-created set of principles that outlines core legal and technical freedoms that are necessary for researchers. The principles would allow them to be able to take advantage of new technologies and practices in the pursuit of scholarly research, including activities such as text and data mining. The drafting of the declaration was led by LIBER, the Association of European Research Libraries. It was developed through contributions from dozens of organizations and individuals. COMMUNIA is an original signatory to the declaration.

One of the key principles recognized in the declaration is that intellectual property law does not regulate the flow of facts, data, and ideas–and that licenses and contract terms should not regulate or restrict how an individual may analyze or use data. To realize the massive, positive potential for data and content analysis to help solve major scientific, medical, and environmental challenges, it’s important that intellectual property laws–and private contracts–do not restrict practices such as text and data mining. Continue reading

The Limits of Copyright: Text and Data Mining

This post was originally published on the Creative Commons blog under CC BY 4.0.

This week is Copyright Week, a series of actions and discussions supporting key principles that should guide copyright policy. Every day this week, various groups are taking on different elements of the law, and addressing what’s at stake, and what we need to do to make sure that copyright promotes creativity and innovation.

Today’s topic is about supporting fair use, a legal doctrine in the United States and a few other countries that permits some uses of copyrighted works without the author’s permission for purposes such as parody, criticism, teaching, and news reporting. Fair use is an important check on the exclusive bundle of rights granted to authors under copyright law. Fair use is considered a “limitation and exception” to copyright.

One area of particular importance within limitations and exceptions to copyright is the practice of text and data mining. Text and data mining typically consists of computers analyzing huge amounts of text or data, and has the potential to unlock huge swaths of interesting connections between textual and other types of content. Understanding these new connections can enable new research capabilities that result in novel scholarly discoveries and critical scientific breakthroughs. Because of this, text and data mining is increasingly important for scholarly research.

Recently the United Kingdom enacted legislation specifically excepting noncommercial text and data mining from copyright. And as the European Commission conducts their review of EU copyright rules, some groups have called for the addition of a specific text and data mining exception. Copyright for Creativity’s manifesto, released Monday, urges the European Commission to add a new exception for text and data mining, in order to support new uses of technology and user needs.

Another view holds that text and data mining activities should be considered outside the purview of copyright altogether. Our response to the EU copyright consultation takes this approach, saying “if text and data mining would be authorized by a copyright exception, it would constitute a de facto recognition that text and data mining are not legitimate usages. We believe that mining texts and data for facts is an activity that is not and should not be protected by copyright and therefore introducing a legislative solution that takes the form of an exception should be avoided.” Similarly, there have been several actions advocating that “The right to read should be the right to mine.”

Whether text and data mining falls under a copyright exception or outside the scope of copyright, it is clearly an activity that should not be able to be controlled by the copyright owner. But unfortunately, that is exactly what some incumbent publishing gatekeepers are trying to do by setting up restrictive contractual agreements. One example of this practice is with the deployment of a set of “open access” licenses from the International Association of Scientific, Technical & Medical Publishers (STM), many of which attempt to restrict text and data mining of the licensed publications. In jurisdictions such as the United States, users do not need to ask permission (or be granted permission through a license) to conduct text and data mining because the activity either falls outside of the scope of copyright or is squarely covered by fair use.

Ensuring that licenses give copyright owners no more control over their content than they have under copyright law is a fundamental principle of Creative Commons licensing. That’s why the CC licenses explicitly state that they in no way restrict uses that are under a limitation or exception to copyright. This means that users do not have to comply with the license for uses of the material permitted by an applicable limitation or exception (such as fair use) or uses that are otherwise unrestricted by copyright law, such as text and data mining in many jurisdictions.

Today’s topic of fair use rights reminds us that “for copyright to achieve its purpose of encouraging creativity and innovation, it must preserve and promote ample breathing space for unexpected and innovative uses.” To liberate the massive potential for innovation made possible by existing and future types of text and data mining, we need user-focused copyright policy that enables these new activities.


Communia response to Science 2.0 consultation

Today the European Commission concluded a consultation on ‘Science 2.0’: Science in Transition. The objective of the consultation is “to better understand the full societal potential of ‘Science 2.0’ as well as the desirability of any possible policy action.” Science 2.0 is defined as the “on-going evolution in the modus operandi of doing research and organising science.” COMMUNIA responded to the questionnaire because there were issues relevant to how scientific research and data could be made available under open licenses or as a part of the public domain. One question asks respondents to rank the specific areas in which they feel a need for policy intervention. We noted that a few opportunities for policy development are open access to publications and research data, and increased attention to policies that support text and data mining. From our submission:

Open access to publication and research data as either in the public domain or under an open license aligned with the Open Definition would help work towards the goals of Science 2.0. Such a policy would be especially important when public funds are expended for scientific research and publications. COMMUNIA policy recommendation #12 states, “all publicly funded research output and educational resources must be made available as open access materials.” Interest in text and data mining is increasing, and traditional gatekeepers of science scholarship (namely commercial publishers) are attempting to restrict this activity through the adoption of custom licenses and/or contractual terms. We think that text and data mining should be considered as outside of the scope of copyright protection, and instead should be considered as an extension of the right to read (see “Right to Read is the Right to Mine”). Text and data mining should not be treated with a contractual approach which would try to license for a fee this usage in addition to the right of access. Terms of use prohibiting the lawful right to perform data mining on a content accessed legitimately should be considered an abuse of exclusive rights.

Here’s our responses to the questionnaire. The Commission’s background paper on the Science 2.0 consultation is here.

Open Letter regarding the Commission’s stakeholder dialogue on text and data mining

In January Communia was invited to participate in the European Commission’s ‘Licenses for Europe‘ stakeholder dialogue. This stakeholder dialogue is one part of the Commission’s agenda to ‘modernise copyright in the digital economy‘. Communia participated in Working Group 4 on Text and Data Mining for Scientific Research Purposes.

Unfortunately the first meeting of this working group which took place on the 4th of February in Brussels did not live up to the expectations raised by the Commission’s earlier announcement. It quickly became evident that the stakeholder dialogue is based on a flawed assumption (‘more licensing will bring copyright in line with the requirements of the digital economy’) and that the process was designed to prevent a serious discussion about how to unlock the potential of scientific text and data mining.

Given this the participating organisations representing academia, researcher community and civil society (including Communia), have decided to make these concerns public in the form of an open letter to the Commissioners Barnier, Geoghegan-Quinn, Kroes and Vassiliou (re-published at the end of this post). The letter which was published today raises a number of concerns that need to be addressed before the stakeholder dialogue on text and data mining can continue.

Chief among these concerns is the belief that in order to have an open discussion about the reform, possible solutions cannot be limited to licensing. From our perspective text and data mining cannot be solved by re-licensing texts to libraries, researchers or the public. What Europe needs is clarity that text and data mining works that are lawfully available does not require permission by rights holders. A stakeholder dialogue that simply declares this position off limits can hardly be called a dialogue at all. In the case of Public Domain content, there is a risk that a focus upon licensing will lead to unlawful re-licensing of content that is out of copyright.

In addition the whole process needs to become more transparent and needs to include all stakeholders (including academics and the Commissions own Research and Innovation Directorate General, which is currently being limited to attend as an observer).

The open letter has been published in the hope of getting the Commission to change the terms under which the stakeholder dialogue is being conducted. Should this not be the case, Communia and the other organisations that have signed the letter are very likely to step away from the dialogue. As the list of supporting signatories shows this is supported by a growing number of academics who are rightfully concerned about the prospects for conducting data driven research in Europe. Continue reading