AI-Generated Code and Open-Source Licences
Steven | TrustYourWebsite · 15 May 2026 · Last updated: May 2026
A Series-A due-diligence lawyer flags two short JavaScript functions in your front-end bundle as substantially similar to a GPL-3.0 file on GitHub. Your developer used Cursor to write the form-validation logic and didn't think about it again. This article walks through whether that exposure is real, who carries it and what to do about it.
How Dutch copyright law treats AI-generated code
Dutch copyright sits under the Auteurswet of 1912, the country's primary copyright statute, supplemented by the Wet op de naburige rechten for neighbouring rights and by the Implementatiewet richtlijn auteursrecht in de digitale eengemaakte markt published in Stb. 2021, 313, which transposed Directive (EU) 2019/790 (the DSM Directive) into Dutch law. Computer programs are explicitly works under Article 10(1)(12) Auteurswet and are governed by Articles 45a through 45n, which transpose the Software Directive 2009/24/EC. That is the framework a Dutch operator is held to when a maintainer or a due-diligence advocaat raises a question about AI-suggested code in a public bundle.
The Dutch originality test is the controlling concept and it sits above the statutory provisions. The Hoge Raad set the standard in its Endstra-tapes judgment of 30 May 2008, requiring that a work has een eigen oorspronkelijk karakter en persoonlijk stempel van de maker, an own original character and the personal stamp of its maker. The CJEU has reinforced this in Infopaq International A/S v Danske Dagblades Forening (Case C-5/08), Painer v Standard Verlags GmbH (Case C-145/10) and Cofemel v G-Star Raw (Case C-683/17), which together require an intellectual creation reflecting the author's free and creative choices. Applied to code produced by Copilot or Cursor from a developer's prompt, the analysis turns on whether the developer's contribution constitutes such free and creative choices. The realistic Dutch position is that short, prompt-driven snippets of validation logic typically fail the test, which means the operator distributing them gains no exclusive Dutch copyright over them and a competitor can reuse them without infringing.
The author-attribution question runs through Articles 4, 7 and 8 of the Auteurswet. Article 4 creates a presumption that the person named on a work is its author. Article 7 vests authorship in the employer when a work is created by an employee in the course of employment, which captures most in-house developer cases. Article 8 vests authorship in a legal person when a work is made public under that person's name and direction without the natural author being named, the typical pattern for an agency that delivers code to a client without naming individual developers. The AI vendor sits outside all three provisions, which means the developer or the agency or the operator carries the authorship attribution, never the model and never the model vendor.
Upstream training is governed by Articles 15n and 15o Auteurswet, which transpose Article 3 and Article 4 of the DSM Directive. Article 15o permits commercial text and data mining on lawfully accessible content, provided the rightsholder has not reserved its rights in a machine-readable form. Public GitHub repositories do not usually carry a machine-readable opt-out, and the open-source licence terms attached to them are licence grants rather than reservations against TDM. That is the framework AI vendors rely on to train on the open-source corpus in the first place, and it sits underneath everything that flows downstream to the operator. A Dutch operator's exposure is on the redistribution side, not the training side.
Enforcement of copyright disputes runs through the Rechtbank Den Haag, the first-instance IP court of the Netherlands with national jurisdiction over copyright and trade-mark matters, and on appeal through the Gerechtshof Den Haag and then the Hoge Raad. Smaller-value or non-specialist matters can be brought before the local Rechtbank, but specialised IP disputes congregate at Den Haag. The Stichting Brein operates as a private-sector anti-piracy organisation primarily for music and film rights but it does pursue source-code matters when funded by member rightsholders. The Autoriteit Persoonsgegevens sits to one side, because copyright over AI-generated code is not in the AP's remit, but adjacent breaches around the same site often are. The Autoriteit Consument & Markt is similarly adjacent for misleading-practice angles under the Wet handhaving consumentenbescherming.
The Doe v. GitHub litigation in the Northern District of California is the most-cited live case in this area worldwide, but a Californian District Court ruling does not bind a Dutch judge. The interpretive weight in the Netherlands would come from the Hoge Raad's reading of the Software Directive 2009/24/EC, the Information Society Directive 2001/29/EC and the DSM Directive against the facts, with the Endstra-tapes originality threshold as the starting point. The CJEU's adjacent originality precedents (Infopaq, Painer, Cofemel) carry direct weight as the Auteurswet is a transposition statute. The practical implication for a Dutch operator is that public-facing risk is governed by Auteurswet remedies, with the open-source licence treated as an obligatie under Boek 6 Burgerlijk Wetboek, even when the headline case originates in the United States.
GitHub Netherlands B.V. is registered with the Kamer van Koophandel in Amsterdam, and a substantial Dutch small-business customer base uses Copilot under terms governed by Irish law for European subscribers via the Dublin EMEA entity. The IP indemnification clause in Copilot Business and Enterprise contracts is governed by those same European terms. That is a useful jurisdictional fact: when a Dutch operator relies on the indemnification, the enforcing forum and the substantive law applied to the indemnification can be Dutch or Irish under the contract's choice-of-law clause, not Californian. The clause does not change who is sued first by an upset maintainer, but it does shape where any indemnification dispute lands.
What the AI actually did
Coding assistants like GitHub Copilot, Cursor, Claude and Cody were trained on huge volumes of public source code, including repositories under GPL, MIT, Apache and BSD licences. The training process did not preserve attribution metadata, and the models learned patterns rather than entire files. When a developer prompts the assistant, the model produces an output that is sometimes a novel construction and sometimes a near-identical reproduction of a specific training-data file. The assistant does not warn the developer which is which, and it does not emit a SPDX header or a copyright notice.
That is the technical fact at the bottom of the legal question. The model is not licensed to redistribute training-data code, and the developer is not warned when the output is structurally close to a specific source.
Who is exposed
The site operator distributes the code that ships to visitors. A browser loading your homepage receives the JavaScript bundle. Under GPL and similar copyleft licences, that is distribution to the end user. The operator is the entity making it available, regardless of whether the operator wrote the line of code or the agency did or an AI suggested it.
This is the same liability chain that applies to web-designer-introduced copyright issues. The pre-AI version of the problem is a designer who dropped an unlicensed Getty photo into the carousel. The post-AI version is a developer who accepted a Copilot suggestion that reproduced a GPL source file. The structure is the same. The public-facing party is the operator. The internal cost allocation between operator and agency is contract.
Sitting next to this is the broader question of who pays when AI-built sites break compliance. GDPR and accessibility liability flow to the operator on the same principle. Copyright on AI-generated code is the copyright corner of that same map.
What the courts have actually said
The leading case is Doe v. GitHub, Inc., filed November 2022 in the Northern District of California. Anonymous developer plaintiffs sued GitHub, Microsoft and OpenAI over Copilot's training on public open-source code. The procedural posture moves, and the table below is a snapshot as of May 2026. Re-verify before relying on it.
<!-- LAST VERIFIED: 2026-05-15 -->Doe v. GitHub claim-by-claim status, May 2026.
| Claim | Status as of May 2026 | What it means for your site |
|---|---|---|
| DMCA § 1202(b) on removing copyright management information | Dismissed with prejudice, January 2024, for "near-identical" outputs | Plaintiffs would need verbatim reproduction to revive. Risk for SMBs: low on this specific theory. |
| Breach of open-source licence terms (MIT, GPL, Apache and others) | Allowed to proceed | Open-source licences are treated as enforceable contracts. Risk for SMBs: moderate where client-side code distributes the output. |
| Tortious interference and unfair competition | Mixed dispositions, some claims survived | Not directly SMB-relevant. The dispute is between the plaintiffs and the AI provider. |
| Unjust enrichment | Dismissed | Not SMB-relevant. |
A live procedural posture. Re-verify before relying on it.
The headline takeaway is narrow. The court has not yet ruled on the central substantive question of whether AI-generated output substantially similar to training code violates the original licence. What it has done is sorted the claim theories. The technical "removal of copyright management information" route under DMCA § 1202(b) is closed where the output is "near-identical with semantically insignificant variations." The contract route, treating an open-source licence as a binding agreement that the AI provider's use violated, is still live. Procedural updates appear on the BakerHostetler tracker and on the plaintiffs' counsel's case page. The plaintiffs' page is one side's framing and should be treated as such.
GPL distribution and your website
The legal question turns on a technical one. What counts as distributing the code?
GPL-style copyleft licences attach attribution and source-availability duties to anyone who distributes a covered work. Distribution to an end user is the trigger. For a website, this maps to two cases.
Client-side JavaScript that ships to the visitor's browser is distribution. Every page load delivers the bundle to a third party, which is the GPL distribution case. If the bundle contains code that is substantially similar to a GPL-licensed file, attribution and source-availability duties apply.
Server-side code that never leaves your server is generally not GPL distribution. The exception is AGPL, where Section 13 treats network use as distribution. Most SMB sites do not run AGPL-licensed backend code, so the practical exposure is concentrated in the client-side bundle: form validation, animations, modals, helper utilities, the kind of small functions a developer asks an AI to write.
This is why the AI-code question matters more for the front end than for the back end of your site. A WordPress plugin that uses Copilot-suggested PHP on the server runs at lower exposure for non-AGPL code than a React component the assistant wrote that ships to every visitor.
How realistic is the risk
Honest probability hierarchy, in order from most to least likely.
The first realistic scenario is an investor or acquirer running due diligence on your codebase before a funding round or an exit. Their lawyers run a licence scanner like FOSSA, ScanCode or licensee. If the scanner flags GPL-licensed code in a proprietary product, the deal-team asks questions. The outcome is usually a remediation budget and a delay, not a killed deal. This is the most common way SMBs find out they have a problem.
The second is an open-source maintainer noticing their code in your public bundle. Larger projects have community members who watch for unattributed reuse. The first contact is a polite email asking for attribution. Escalation looks like a DMCA takedown sent to your host, which interrupts service until you respond. Lawsuits at this level are rare for SMBs because the cost of bringing one outweighs the recovery against a small business.
The third is enforcement by a copyleft-licence steward organisation such as the Software Freedom Conservancy. These groups do bring enforcement actions, but their pattern is to engage in long correspondence first and to target hardware vendors or larger software companies. The threshold for an SMB website is high.
In practice, the realistic week-to-week risk for a small business site is zero. The risk concentrates around three moments: a funding round, an acquisition or a maintainer searching the internet for their distinctive function. None of these is likely in a given month, but all are predictable and avoidable. If you want a quick read on what your own site already exposes to a curious maintainer, run a free compliance scan. It checks for unattributed third-party code patterns alongside the cookies, accessibility and image-rights checks.
Practical mitigation if you or your developer use AI tools
Five things to do. None of these is a legal defence and none should be sold to you as one. They are engineering hygiene that reduces the chance the problem ever surfaces.
First, turn on the duplication filter in Copilot, Cursor or any other coding assistant that offers one. The filter blocks suggestions that match training-data code above a similarity threshold. It does not eliminate near-identical output, but it does reduce the worst case. Confirm the setting is on in the developer's actual editor configuration, not just on the team account.
Second, run a licence scanner before deployment. Free tools include licensee, scancode-toolkit and ort (the OSS Review Toolkit). Commercial options include FOSSA, Snyk Licence and Black Duck. The scanner reads your package manifests and your source tree and flags licences that conflict with your distribution model. Running this once on the production bundle is more useful than running it never.
Third, if your developer is on paid Copilot Business or Enterprise, GitHub offers an IP indemnification commitment against third-party claims arising from Copilot output, conditional on the duplication filter being enabled. This is a meaningful contractual backstop, but it is conditional on the filter setting, limited to the named plans and verifiable only against the current terms before relying on it. Free Copilot, Cursor, Claude and Cody do not, as of May 2026, offer equivalent commitments.
Fourth, update your agency contract. Add a clause that the agency will not use AI-assisted code that incorporates GPL or AGPL output without explicit written notice to you, and that the agency warrants the delivered site does not infringe third-party licences. This does not protect you from the maintainer who notices. It does give you a route to push the cost back to the agency if a claim arises.
Fifth, keep a software bill of materials for your client-side bundle. Tools like cyclonedx-bom or the SBOM exports built into modern bundlers list every dependency and its licence. If a question arises in a year, having an SBOM from the release in question saves a week of work.
Our free compliance scan covers GDPR, cookies, accessibility and image rights on the live site. It does not check open-source licence compliance, which is a separate developer-tooling job. Treat the two as parallel tracks on the same site.
What changes on 9 December 2026
Directive (EU) 2024/2853, the new Product Liability Directive, treats software including AI systems as products from 9 December 2026. Article 4 brings AI tools into scope. Article 2(2) excludes open-source software developed outside a commercial activity, so the public open-source maintainer is not the defendant in a PLD claim. The commercial AI vendor is.
The relevance to AI-generated code is narrow. A small business harmed by a defective AI tool, for example where the AI emits code with a security flaw that leads to a data breach with downstream harm to a natural person, may have a new no-fault claim path against the AI vendor under the directive. The claim is for damage to natural persons, and it applies only to products placed on the market after 9 December 2026 in Member States that have transposed the directive. The PLD is not a route for the operator to recover a regulatory fine and it does not retroactively reach pre-existing tools. The Product Liability Directive in depth covers the scope and exclusions.
What this is not
This article is about open-source licence exposure when an AI writes code on your site. Three adjacent topics share words with this one.
Chatbot disclosure and AI-generated marketing copy labelling sit under Article 50 of the AI Act and AI-generated content, which is a separate regime from source-code copyright. The image side, where AI-generated illustrations or photographs may infringe, lives in AI-generated images on your site. The cookie-banner and accessibility version of the same liability chain is the broader GDPR and accessibility question. For the pre-AI image-letter version, see the Getty Images letter guide or our take on Copytrack and PicRights letters.
Common Questions
Does Copilot's duplication filter eliminate the risk?
No. The filter reduces the chance of verbatim reproduction of training-data code, which is the worst case. It does not address near-identical output that still resembles a specific open-source file. Treat the filter as risk reduction, not as a legal shield.
Am I liable if my freelancer used Cursor without telling me?
The site operator is the party distributing the code to visitors. An open-source maintainer who notices their code in your bundle writes to the domain owner. Your freelancer may owe you a fix under contract, but the public-facing exposure sits with you.
Does this apply to server-side code or just client-side?
Mostly client-side. Code that ships to the browser is distribution under GPL and triggers attribution and source-availability duties. Server-side code that never leaves your server is generally not GPL-distribution, except for AGPL, where network use counts as distribution under section 13.
Is there any AI coding tool that is safer than others?
Paid Copilot Business and Enterprise plans include an IP indemnity from GitHub when the duplication filter is enabled. No equivalent commitment is standard on Cursor, Claude, Cody or free Copilot tiers as of May 2026. Verify current terms before relying on any vendor promise.
Related reading
Cluster pieces that pair with this one:
- The full liability picture for AI-built sites. The hub article on GDPR, EAA and cookie-law liability for AI-assisted sites.
- AI-generated images on your website. The image side of the AI-output copyright question.
- Product Liability Directive 2024/2853. Strict-liability claims for damage from defective AI tools, applicable from 9 December 2026.
- Web designer copyright liability. The pre-AI parent article on agency-client copyright chains.
This article is technical analysis, not legal advice. The author is not your lawyer. For a binding view on a live licence question, talk to one.
Website Guides
Copyright Claim Letter: What to Do (and What to Avoid)
Received a Pixsy, CopyTrack or PicRights demand letter? What the claim actually means under Dutch copyright law, how to negotiate, and when to call a lawyer.
How Much Does a Copyright Claim Actually Cost? (EU)
How much a copyright claim costs in the EU: real settlement ranges for Getty Images, Copytrack and PicRights demands plus what drives the price up or down.
Should You Ignore a Copyright Demand Letter? (EU)
Should you ignore a Getty, Copytrack or PicRights demand letter? Why silence usually backfires and the rare situations where it might be the right call.
Received a CopyTrack or Picright Claim? Is It Legitimate?
CopyTrack and Picright send copyright claims for unlicensed images on websites. Are these claims legitimate? What to do and what not to do when you receive one.
Received a Getty Images Letter? A Complete Response Guide
Getty Images letter: what the demand means, Dutch case law on damages, how to negotiate, realistic settlement ranges for small businesses.