UK government report: The right to read should be the right to mine

De zanderij
AI development requires permissive TDM rules
Licentie

Last month the British government published an independent report on Growing the artificial intelligence industry in the UK. The review, conducted by Professor Dame Wendy Hall and Jérôme Pesenti, discusses the potential for how artificial intelligence (AI) “can bring major social and economic benefits to the UK,” highlighting that AI could contribute an additional £630bn to the UK economy by 2035.

The report makes several recommendations that could be explored to support the continued development and adoption of AI in the UK, including improving access to data, training experts, and increasing demand for AI applications. Of particular interest to us are two specific recommendations:

“To improve the availability of data for developing AI systems, Government should ensure that public funding for research explicitly ensures publication of underlying data in machine-readable formats with clear rights information, and open wherever possible.

[and]

“To support text and data mining as a standard and essential tool for research, the UK should move towards establishing by default that for published research the right to read is also the right to mine data, where that does not result in products that substitute for the original works. Government should include potential uses of data for AI when assessing how to support for text and data mining.

It is clearly beneficial that governments require that the outputs of publicly funded research and data be made widely available in open technical formats that are consumable by computers. If the data is not made available in machine-readable formats, it will be impossible to efficiently conduct text and data mining across a large corpus of works. It’s also good that the report recommends that the UK push for an environment where “the right to read is the right to mine”—meaning that legal access to the underlying text or data should be sufficient for the user to conduct any further research techniques (such as TDM) and that no additional legal permissions or licenses should be  required in order to do so. Continue reading

Open Definition 2.0 released

This post initially appeared on the Creative Commons blog, republished here under CC BY 4.0

Today Open Knowledge and the Open Definition Advisory Council announced the release of version 2.0 of the Open Definition. The Definition “sets out principles that define openness in relation to data and content,” and is the baseline from which various public licenses are measured. Any content released under an Open Definition-conformant license means that anyone can “freely access, use, modify, and share that content, for any purpose, subject, at most, to requirements that preserve provenance and openness.” The CC BY and CC BY-SA 4.0 licenses are conformant with the Open Definition, as are all previous versions of these licenses (1.0 – 3.0, including jurisdiction ports). The CC0 Public Domain Dedication is also aligned with the Open Definition.

The Open Definition is an important standard that communicates the fundamental legal conditions that make content and data open. One of the most notable updates to version 2.0 is that it separates and clarifies the requirements under which an individual work will be considered open from the conditions under which a license will be considered conformant with the Definition.

Public sector bodies, GLAM institutions, and open data initiatives around the world are looking for recommendation and advice on the best licenses for their policies and projects. It’s helpful to be able to point policymakers and data publishers to a neutral, community-supported definition with a list of approved licenses for sharing content and data (and of course, we think that CC BY, CC BY-SA, and CC0 are some of the best, especially for publicly funded materials). And while we still see that some governments and other institutions are attempting to create their own custom licenses, hopefully the Open Definition 2.0 will help guide these groups into understanding of the benefits to using an existing OD-compliant license. The more that content and data providers use one of these licenses, the more they’ll add to a huge pool of legally reusable and interoperable content for anyone to use and repurpose.

To the extent that new licenses continue to be developed, the Open Definition Advisory Council has been honing a process to assist in evaluating whether licenses meet the Open Definition. Version 2.0 continues to urge potential license stewards to think carefully before attempting to develop their own license, and requires that they understand the common conditions and restrictions that should (or should not) be contained in a new license in order to promote interoperability with existing licenses.

Open Definition version 2.0 was collaboratively and transparently developed with input from experts involved in open access, open culture, open data, open education, open government, open source and wiki communities. Congratulations to Open Knowledge and the Open Definition Advisory Council on this important improvement.

Communia response to Science 2.0 consultation

Today the European Commission concluded a consultation on ‘Science 2.0’: Science in Transition. The objective of the consultation is “to better understand the full societal potential of ‘Science 2.0’ as well as the desirability of any possible policy action.” Science 2.0 is defined as the “on-going evolution in the modus operandi of doing research and organising science.” COMMUNIA responded to the questionnaire because there were issues relevant to how scientific research and data could be made available under open licenses or as a part of the public domain. One question asks respondents to rank the specific areas in which they feel a need for policy intervention. We noted that a few opportunities for policy development are open access to publications and research data, and increased attention to policies that support text and data mining. From our submission:

Open access to publication and research data as either in the public domain or under an open license aligned with the Open Definition would help work towards the goals of Science 2.0. Such a policy would be especially important when public funds are expended for scientific research and publications. COMMUNIA policy recommendation #12 states, “all publicly funded research output and educational resources must be made available as open access materials.” Interest in text and data mining is increasing, and traditional gatekeepers of science scholarship (namely commercial publishers) are attempting to restrict this activity through the adoption of custom licenses and/or contractual terms. We think that text and data mining should be considered as outside of the scope of copyright protection, and instead should be considered as an extension of the right to read (see “Right to Read is the Right to Mine”). Text and data mining should not be treated with a contractual approach which would try to license for a fee this usage in addition to the right of access. Terms of use prohibiting the lawful right to perform data mining on a content accessed legitimately should be considered an abuse of exclusive rights.

Here’s our responses to the questionnaire. The Commission’s background paper on the Science 2.0 consultation is here.

European Parliament Approves Updated PSI Directive

Yesterday, the European Parliament formally adopted the updated directive on the reuse of public sector information. The announcement confirms the draft changes made to the directive in April of this year. Some notable changes (see here for a more comprehensive breakdown of the changes):

  • libraries, museums, and archives are now be covered under the directive

  • all legally public documents are subject to reuse under the directive

  • any charges are be limited to marginal costs of reproduction, provision and dissemination

  • documents and metadata are to be made available for reuse under open standards and using machine readable formats

European Commission Vice-President Neelie Kroes praised the adoption of the new rules on open data:

[T]o make a real difference you need a few things. You need prices for the data to be reasonable if not free – given that the marginal cost of your using the data is pretty low. You need to be able to not just use the data: but re-use it, without dealing with complex conditions […] We are giving you new rights for how you can access their public data for re-use, but also extending rules to include museums and galleries. That could open up whole new areas of cultural content, with applications from education to tourism. Indeed, Europeana already has over 25 million cultural items digitised and available for all to see – with metadata under an open, CC0 licence.

The Communia Association has been keenly interested and involved in seeing public sector data freed for widespread use by making it broadly available in the public domain. In January 2012 we released a policy paper with suggested changes to the PSI directive. Communia is pleased to see that cultural heritage institutions are included under the scope of the amended directive. Another positive aspect of the new reuse directive is the narrowing of the language around acceptable licensing for public sector information through the removal of text encouraging the development of additional open government licenses. At the same time, the Commission has not clarified what should be considered a “standard license,” thus there is an ongoing concern potential for Member states to create diverging and potentially incompatible license implementations. And, the EU lawmakers chose not to address the Communia recommendation of explicitly including public domain content held by libraries, museums and archives under the reuse obligation of the amended directive. But all in all, the updated directive is a step in the right direction.

The new directive will be implemented by Member states over the next two years. In the interim, the Commission will be looking for guidance on licensing issues (among other things) from EU-funded projects such as LAPSI 2.0. Communia is an active member in the LAPSI group. LAPSI will be developing PSI licensing guidelines and good practices as a deliverable to the Commission.

Petition in support of a single European Data License

In line with an issue raised in our policy paper on the proposed amendments to PSI Directive there is now a Spanish petition that asks the Europeana Commission to propose a single open data license to be used for Public Sector Information across all EU member states:

Dear Neelie Kroes,

We sincerely admire the courage and innovacion [sic] spirit shown by the European Commission in the revision of the ReUse of Public Sector Information Directive. However, as a member of the Opendata community I think the new Directive will be incomplete without the definition of an Opendata Licence shared by all the Member States Public Administration.

We encourage the European Commission to propose the Member States an Opendata Licence, badly needed to create a ReUse of PSI single market. The alternative to a shared opendata licence in the European Union would be a fragmented market similar to the current intellectual property rights landscape in Europe.

Let’s build a single opendata market with a single opendata licence.

Of course a open data space with fragmented licensing conditions cannot never be as bad as the overall intellectual property rights landscape in Europe, but the overall argument is very solid. If the Commission wants to unlock the potential of open data for all of Europe then the best instrument to do so is a single, standardized open data license for all of Europe.