The remarkable development and democratization of Generative Artificial Intelligence (GAI) present significant opportunities across numerous sectors, while also raising concerns related to potential risks, especially in intellectual property matters. The question of copyright protection arises at two stages: upstream, during the training phase of data used by the GAI, and downstream, when the GAI generates content based on this data.
Generative Artificial Intelligence ("GAI") is a specific AI system capable of autonomously creating new data, images1, texts2, music, and videos using machine learning models and instructions from a human user3.
It currently has several applications, particularly in business, for professional content creation, graphic design, operations optimization through predictive models, and customer support via chatbots.
The generation of this sought-after content is made possible through the parameterized use of massive archive data, especially using data indexing4 and extraction techniques5 carried out online.
The enthusiasm for GAI matches the concerns it raises:
- Not only regarding manipulation risks and threats to liberties, which the European regulation 2024/1689 on artificial intelligence, effective as of August 1, 2024, aims to address6,
- But more specifically, regarding the risks of copyright infringement, relating both to the extensive data used during GAI training and the use of the new content generated.
Among the numerous challenges raised by the emergence of GAI, the issues of legality and control over the "input" data used during GAI training7 that are subject to copyright, and (I) the possible copyright protection of "output" content generated by GAI (II) are particularly prominent.
I. The Legality of Using Training Data: The Fragile Balance Between Copyright Compliance and Support for GAI Development
Among the massive data used during GAI training to generate new content based on user "prompts" or requests, some are protected by copyright, particularly images, texts, sounds, and music that display an "original" character, reflecting the unique, free, and creative choices of their authors.
In principle, the mere reproduction—even partially—of "input" data protected by copyright for the generation of "output" content by GAI would require the prior authorization of the author of the input data. Without this authorization, the author could file an infringement claim against the GAI provider or user.
In practice, identifying GAI's use of training data protected by copyright is challenging, given the opacity of most of these systems for the public. This is even more difficult when the content generated by GAI, visible only to the user, does not reproduce the characteristics of the training data under copyright protection.
The “Text and Data Mining” Exception:
To maintain the competitiveness of innovative European companies operating in the GAI sector and to strike a balance with authors’ rights, the AI Regulation legitimizes the application of the “text and data mining” exception8 to GAI. This allows, without financial compensation, the collection and reproduction of training data accessible online and protected by copyright9.
This exception enables GAI providers to bypass the need for authorization, as long as the author or their rightsholders have not exercised their right to withdraw10.
This exception can, however, be overridden—outside the scientific research domain—by a copyright holder’s "opt-out" right 11. This allows them to oppose any use of their works by GAI without prior authorization.
In practice, exercising the opt-out or defending copyright proves challenging since it is difficult for authors to verify the use of their works.
The application of the “text and data mining” exception is also strongly criticized by authors and rightsholders, as GAI was not specifically anticipated at the time of its introduction by the EU Directive 2019/790 of April 17, 2019. Critics argue that applying this exception to GAI may not comply with the “Three-Step Test” 12 set by international treaties and European regulations, which state that the exception may only apply in "certain special cases" that do not interfere with "normal exploitation of the work" and do not cause "unjustified harm to the legitimate interests of the rights holders."
Mass-produced GAI content, offered at low cost, may indeed compete with authors’ works, disrupting their regular exploitation and depriving them of potential income while causing them unjustified harm without a compensatory mechanism.
The Transparency Obligation for GAI Developers and Providers
In response to these concerns, the AI Regulation imposes a transparency obligation on GAI developers and providers to inform users about the origin and nature of the data used13 and to enable authors to identify the use of their works.
Under this requirement, GAI providers must disclose a sufficiently detailed summary of the training data used by their system, though the specifics of this requirement are yet to be defined.
In France, the Higher Council for Literary and Artistic Property (CSPLA) was tasked in April 2024 with establishing a list of information that GAI providers must discloseepending on the cultural sectors involved, to allow authors and neighboring rights holders to exercise their rights14.
The details of the information obligation for AI model providers are expected to be clarified soon, along with the timeline for implementing such an obligation, considering that many GAIs have already been trained on massive online datasets.
The CSPLA has also been tasked with proposing legal mechanisms ensuring fair remuneration for rights holders by sector.
In the United States, the use of pre-existing works by GAI has led to at least 20 ongoing lawsuits against GAI providers, where the application of “fair use”—a copyright exception—is also being debated. In Germany, a decision issued by the Hamburg District Court on September 27, 2024 confirmed the application of the “text and data mining” exception to training data15, and further emphasized the need for transparency in the use of such data for their authors.
Alternative means of ensuring respect for copyright on training data are also being considered and proposed at the European level, including technical measures, the establishment of a pre-certification mechanism for GAI providers targeting the European market, and the marking of AI-generated content to make it identifiable, such as via a “tag.”
II. Copyright Protection for GAI-Generated Content
The generation of content by GAI based on processing input data during training raises the question of whether it can be protected by copyright.
According to the personalist conception of French copyright law, a creation entirely generated by GAI—by nature devoid of personality—without the "free and creative" choice of a human individual, could not benefit from copyright protection. This view is shared by other legal cultures, as evidenced by some—albeit rare—decisions in the United States, despite evident disparities in copyright approaches across different jurisdictions.
As a result, neither GAI itself nor the GAI provider—although potentially holding rights related to the GAI software—could be eligible for copyright protection in France on the productions generated through this system.
However, if GAI is used as a tool to assist in creating a work that reflects the personal choices of the human author, and their respective contributions are identifiable, the recognition of copyright for the GAI user on this work is theoretically possible. A parallel can be drawn with a camera, a technical tool enabling the creation of works that can be protected by copyright.
Nevertheless, to claim such protection in France, the personal contribution of the GAI user would, in theory, have to go beyond the mere formulation of a request ("prompt"), no matter how detailed, and involve "downstream" control and an original contribution to the final generated content, which must be the result of "free and creative choices."
At this stage, GAI-generated content still remains the product of random choices and uncontrollable algorithmic calculations, with the user’s role often limited to providing an idea guiding the GAI system, which is not protectable in itself.
No judgment has been ruled yet in France, but this approach tends to be upheld in the United States, where the personal contribution of the human author is assessed at all stages of content production by GAI, including at the moment of "output" of generated data (texts, images, videos, sounds). In contrast, China has been more open to protecting GAI-generated content as long as significant human input is observed, even if it is only at the level of input data and the request.
In conclusion, the evolution of case law and legislation on these matters should clarify the legal solutions to adopt and provide a more secure environment for GAI providers, users, and authors of intellectual works.
Our teams remain informed of developments on this subject and are available to advise you on intellectual property issues related to the use of generative artificial intelligence.
Notes:
- Examples of GAI image generation from textual requests, or variations of existing images: DALL-E by OpenAI, Midjourney, Stable Diffusion.
- Examples: ChatGPT, a chatbot developed by OpenAI, capable of generating high-quality text and responding to questions; Bard, a competing tool developed by Google.
- GAI definition by the CNIL: "a system capable of creating text, images, or other content (music, video, voice, etc.) based on a human user's instruction. These systems can produce new content based on training data."
- Technique also known as “web crawling.”
- Also called "web scraping."
- Exception introduced by the European Directive 2019/790 of April 17, 2019 on copyright and related rights in the digital single market. This exception is defined by Directive 2019/790 as "any automated analytical technique aimed at analyzing texts and data in digital form to extract information, including, but not limited to, constants, trends, and correlations."
- Recitals 105-106 of the AI Regulation 2024-1689; Article 53 Section II of the AI Regulation requiring AI model providers to: "(c) establish a policy to comply with Union law on copyright and related rights, and in particular to identify and respect, including by means of advanced technologies, a reserve of rights expressed in accordance with Article 4(3) of Directive (EU) 2019/790."
- Recital 6 of Directive EU 2019/790 of April 17, 2019.
- Recitals 107 - 108 of the AI Regulation 2024-1689.
- Article 53 Section II of the AI Regulation 2024-1689 requiring AI model providers to: "(d) draft and make available to the public a sufficiently detailed summary of the content used to train the general-purpose AI model, according to a template provided by the AI office."
- CSPLA – Letter of Mission – AI Rules – April 2024.
- Hamburg District Court decision of September 27, 2024, No. 310 O 227/23.
- Report No. 2207 by the French National Assembly on the challenges of generative artificial intelligence, dated February 14, 2024, pages 40-41.
- In the US, the US Copyright Office has rejected copyright registrations for works solely generated by GAI, such as Jason Allen's "Space Opera Theater" on September 5, 2023, and "Zarya of the Dawn," a comic book generated solely by GAI, on February 21, 2023.
- In China, the Beijing Internet Court ruled on November 27, 2023, that a user’s input choices may be sufficient to grant copyright for GAI-generated content if those choices reflect personal creative decisions.