AI-Generated Code and Open-Source Licences

Steven | TrustYourWebsite · 15 May 2026 · Last updated: May 2026

A Series-A due-diligence lawyer flags two short JavaScript functions in your front-end bundle as substantially similar to a GPL-3.0 file on GitHub. Your developer used Cursor to write the form-validation logic and didn't think about it again. This article walks through whether that exposure is real, who carries it and what to do about it.

Dutch copyright sits under the Auteurswet of 1912, the country's primary copyright statute, supplemented by the Wet op de naburige rechten for neighbouring rights and by the Implementatiewet richtlijn auteursrecht in de digitale eengemaakte markt published in Stb. 2021, 313, which transposed Directive (EU) 2019/790 (the DSM Directive) into Dutch law. Computer programs are explicitly works under Article 10(1)(12) Auteurswet and are governed by Articles 45a through 45n, which transpose the Software Directive 2009/24/EC. That is the framework a Dutch operator is held to when a maintainer or a due-diligence advocaat raises a question about AI-suggested code in a public bundle.

The Dutch originality test is the controlling concept and it sits above the statutory provisions. The Hoge Raad set the standard in its Endstra-tapes judgment of 30 May 2008, requiring that a work has een eigen oorspronkelijk karakter en persoonlijk stempel van de maker, an own original character and the personal stamp of its maker. The CJEU has reinforced this in Infopaq International A/S v Danske Dagblades Forening (Case C-5/08), Painer v Standard Verlags GmbH (Case C-145/10) and Cofemel v G-Star Raw (Case C-683/17), which together require an intellectual creation reflecting the author's free and creative choices. Applied to code produced by Copilot or Cursor from a developer's prompt, the analysis turns on whether the developer's contribution constitutes such free and creative choices. The realistic Dutch position is that short, prompt-driven snippets of validation logic typically fail the test, which means the operator distributing them gains no exclusive Dutch copyright over them and a competitor can reuse them without infringing.

The author-attribution question runs through Articles 4, 7 and 8 of the Auteurswet. Article 4 creates a presumption that the person named on a work is its author. Article 7 vests authorship in the employer when a work is created by an employee in the course of employment, which captures most in-house developer cases. Article 8 vests authorship in a legal person when a work is made public under that person's name and direction without the natural author being named, the typical pattern for an agency that delivers code to a client without naming individual developers. The AI vendor sits outside all three provisions, which means the developer or the agency or the operator carries the authorship attribution, never the model and never the model vendor.

Upstream training is governed by Articles 15n and 15o Auteurswet, which transpose Article 3 and Article 4 of the DSM Directive. Article 15o permits commercial text and data mining on lawfully accessible content, provided the rightsholder has not reserved its rights in a machine-readable form. Public GitHub repositories do not usually carry a machine-readable opt-out, and the open-source licence terms attached to them are licence grants rather than reservations against TDM. That is the framework AI vendors rely on to train on the open-source corpus in the first place, and it sits underneath everything that flows downstream to the operator. A Dutch operator's exposure is on the redistribution side, not the training side.

Enforcement of copyright disputes runs through the Rechtbank Den Haag, the first-instance IP court of the Netherlands with national jurisdiction over copyright and trade-mark matters, and on appeal through the Gerechtshof Den Haag and then the Hoge Raad. Smaller-value or non-specialist matters can be brought before the local Rechtbank, but specialised IP disputes congregate at Den Haag. The Stichting Brein operates as a private-sector anti-piracy organisation primarily for music and film rights but it does pursue source-code matters when funded by member rightsholders. The Autoriteit Persoonsgegevens sits to one side, because copyright over AI-generated code is not in the AP's remit, but adjacent breaches around the same site often are. The Autoriteit Consument & Markt is similarly adjacent for misleading-practice angles under the Wet handhaving consumentenbescherming.

The Doe v. GitHub litigation in the Northern District of California is the most-cited live case in this area worldwide, but a Californian District Court ruling does not bind a Dutch judge. The interpretive weight in the Netherlands would come from the Hoge Raad's reading of the Software Directive 2009/24/EC, the Information Society Directive 2001/29/EC and the DSM Directive against the facts, with the Endstra-tapes originality threshold as the starting point. The CJEU's adjacent originality precedents (Infopaq, Painer, Cofemel) carry direct weight as the Auteurswet is a transposition statute. The practical implication for a Dutch operator is that public-facing risk is governed by Auteurswet remedies, with the open-source licence treated as an obligatie under Boek 6 Burgerlijk Wetboek, even when the headline case originates in the United States.

GitHub Netherlands B.V. is registered with the Kamer van Koophandel in Amsterdam, and a substantial Dutch small-business customer base uses Copilot under terms governed by Irish law for European subscribers via the Dublin EMEA entity. The IP indemnification clause in Copilot Business and Enterprise contracts is governed by those same European terms. That is a useful jurisdictional fact: when a Dutch operator relies on the indemnification, the enforcing forum and the substantive law applied to the indemnification can be Dutch or Irish under the contract's choice-of-law clause, not Californian. The clause does not change who is sued first by an upset maintainer, but it does shape where any indemnification dispute lands.

How AI-suggested code becomes a license exposure on your website.Five-stage horizontal flow showing how a prompt to a coding assistant turns into a license obligation for the website operator. Stage one is the developer prompt to Cursor, Copilot or Claude. Stage two is the AI suggestion that may reproduce training-data patterns. Stage three is the code landing in the agency repository with no license metadata preserved. Stage four is the bundle served to browser visitors. Stage five is public distribution, where GPL, MIT or Apache obligations are triggered. Beneath the flow two horizontal bars compare server-side code with lower exposure for non-AGPL projects against client-side JavaScript with full distribution exposure to end users. A right-side annotation references the January 2024 ruling in Doe v. GitHub that dismissed certain DMCA section 1202(b) claims for near-identical output and allowed open-source license breach claims to proceed.From prompt to public distributionDeveloperpromptCursor / Copilot/ ClaudeAI suggestionmay reproducetraining-datapatternsAgency repono licensemetadatapreservedBundled intoyour siteserved to browservisitorsPublic distributionGPL / MIT / Apacheobligationstriggered hereExposure by code locationServer-side code: lower exposure (AGPL excepted)Client-side JavaScript: full distribution to end usersDoe v. GitHub, January 2024DMCA section 1202(b) claims for "near-identical,not verbatim" output dismissed with prejudice.Open-source licence-breach claims allowedto proceed. Case ongoing.Where the licence weight livesThe maintainer who notices their code reaches out to the entity distributing it. That isthe site operator, not the developer and not the AI vendor. The developer's contract withthe AI vendor stays in the background. The operator handles the public-facing question.
The legal weight sits at the last stage. The further down the chain you sit, the more you carry.

What the AI actually did

Coding assistants like GitHub Copilot, Cursor, Claude and Cody were trained on huge volumes of public source code, including repositories under GPL, MIT, Apache and BSD licences. The training process did not preserve attribution metadata, and the models learned patterns rather than entire files. When a developer prompts the assistant, the model produces an output that is sometimes a novel construction and sometimes a near-identical reproduction of a specific training-data file. The assistant does not warn the developer which is which, and it does not emit a SPDX header or a copyright notice.

That is the technical fact at the bottom of the legal question. The model is not licensed to redistribute training-data code, and the developer is not warned when the output is structurally close to a specific source.

Who is exposed

The site operator distributes the code that ships to visitors. A browser loading your homepage receives the JavaScript bundle. Under GPL and similar copyleft licences, that is distribution to the end user. The operator is the entity making it available, regardless of whether the operator wrote the line of code or the agency did or an AI suggested it.

This is the same liability chain that applies to web-designer-introduced copyright issues. The pre-AI version of the problem is a designer who dropped an unlicensed Getty photo into the carousel. The post-AI version is a developer who accepted a Copilot suggestion that reproduced a GPL source file. The structure is the same. The public-facing party is the operator. The internal cost allocation between operator and agency is contract.

Sitting next to this is the broader question of who pays when AI-built sites break compliance. GDPR and accessibility liability flow to the operator on the same principle. Copyright on AI-generated code is the copyright corner of that same map.

What the courts have actually said

The leading case is Doe v. GitHub, Inc., filed November 2022 in the Northern District of California. Anonymous developer plaintiffs sued GitHub, Microsoft and OpenAI over Copilot's training on public open-source code. The procedural posture moves, and the table below is a snapshot as of May 2026. Re-verify before relying on it.

<!-- LAST VERIFIED: 2026-05-15 -->

Doe v. GitHub claim-by-claim status, May 2026.

ClaimStatus as of May 2026What it means for your site
DMCA § 1202(b) on removing copyright management informationDismissed with prejudice, January 2024, for "near-identical" outputsPlaintiffs would need verbatim reproduction to revive. Risk for SMBs: low on this specific theory.
Breach of open-source licence terms (MIT, GPL, Apache and others)Allowed to proceedOpen-source licences are treated as enforceable contracts. Risk for SMBs: moderate where client-side code distributes the output.
Tortious interference and unfair competitionMixed dispositions, some claims survivedNot directly SMB-relevant. The dispute is between the plaintiffs and the AI provider.
Unjust enrichmentDismissedNot SMB-relevant.

A live procedural posture. Re-verify before relying on it.

Doe v. GitHub — claim-by-claim status, May 2026Four-row table summarising the procedural status of the main claim theories in Doe v. GitHub as of May 2026. A US Northern District of California case; a Dutch court would treat it as comparative context, not binding precedent. DMCA section 1202(b) was dismissed with prejudice in January 2024 for near-identical outputs. Breach of open-source licence terms was allowed to proceed. Tortious interference and unfair competition had mixed dispositions. Unjust enrichment was dismissed. Each row notes the practical relevance for a Dutch small or medium business website operator.Doe v. GitHub — claim-by-claim status, May 2026ClaimStatus (May 2026)What it means for your Dutch siteDMCA § 1202(b) on removingcopyright managementinformationDismissedwith prejudice, Jan 2024US-specific statute; persuasive onlyin the Netherlands. Plaintiffs wouldneed verbatim reproduction to revive.Risk for Dutch SMBs: low.Breach of open-source licenceterms (MIT, GPL, Apacheand others)Proceedingallowed to continueDutch contract law would similarlytreat open-source licences asenforceable. Risk for SMBs:moderate on client-side code.Tortious interferenceand unfair competitionMixedsome claims survivedNot directly SMB-relevant. Disputeis between plaintiffs and the AIprovider, not the site operator.Unjust enrichmentDismissedNot SMB-relevant.Snapshot, May 2026. US case — comparative context only for a Dutch court. Re-verify with the docket.N.D. Cal. 4:22-cv-06823-JST · Trackers: bakerlaw.com/the-copilot-litigation · githubcopilotlitigation.com
Four claim theories from a US case. One dismissed with prejudice; one still live as a contract theory; the other two not directly SMB-relevant.

The headline takeaway is narrow. The court has not yet ruled on the central substantive question of whether AI-generated output substantially similar to training code violates the original licence. What it has done is sorted the claim theories. The technical "removal of copyright management information" route under DMCA § 1202(b) is closed where the output is "near-identical with semantically insignificant variations." The contract route, treating an open-source licence as a binding agreement that the AI provider's use violated, is still live. Procedural updates appear on the BakerHostetler tracker and on the plaintiffs' counsel's case page. The plaintiffs' page is one side's framing and should be treated as such.

GPL distribution and your website

The legal question turns on a technical one. What counts as distributing the code?

GPL-style copyleft licences attach attribution and source-availability duties to anyone who distributes a covered work. Distribution to an end user is the trigger. For a website, this maps to two cases.

Client-side JavaScript that ships to the visitor's browser is distribution. Every page load delivers the bundle to a third party, which is the GPL distribution case. If the bundle contains code that is substantially similar to a GPL-licensed file, attribution and source-availability duties apply.

Server-side code that never leaves your server is generally not GPL distribution. The exception is AGPL, where Section 13 treats network use as distribution. Most SMB sites do not run AGPL-licensed backend code, so the practical exposure is concentrated in the client-side bundle: form validation, animations, modals, helper utilities, the kind of small functions a developer asks an AI to write.

This is why the AI-code question matters more for the front end than for the back end of your site. A WordPress plugin that uses Copilot-suggested PHP on the server runs at lower exposure for non-AGPL code than a React component the assistant wrote that ships to every visitor.

How realistic is the risk

Honest probability hierarchy, in order from most to least likely.

The first realistic scenario is an investor or acquirer running due diligence on your codebase before a funding round or an exit. Their lawyers run a licence scanner like FOSSA, ScanCode or licensee. If the scanner flags GPL-licensed code in a proprietary product, the deal-team asks questions. The outcome is usually a remediation budget and a delay, not a killed deal. This is the most common way SMBs find out they have a problem.

The second is an open-source maintainer noticing their code in your public bundle. Larger projects have community members who watch for unattributed reuse. The first contact is a polite email asking for attribution. Escalation looks like a DMCA takedown sent to your host, which interrupts service until you respond. Lawsuits at this level are rare for SMBs because the cost of bringing one outweighs the recovery against a small business.

The third is enforcement by a copyleft-licence steward organisation such as the Software Freedom Conservancy. These groups do bring enforcement actions, but their pattern is to engage in long correspondence first and to target hardware vendors or larger software companies. The threshold for an SMB website is high.

In practice, the realistic week-to-week risk for a small business site is zero. The risk concentrates around three moments: a funding round, an acquisition or a maintainer searching the internet for their distinctive function. None of these is likely in a given month, but all are predictable and avoidable. If you want a quick read on what your own site already exposes to a curious maintainer, run a free compliance scan. It checks for unattributed third-party code patterns alongside the cookies, accessibility and image-rights checks.

Practical mitigation if you or your developer use AI tools

Five things to do. None of these is a legal defence and none should be sold to you as one. They are engineering hygiene that reduces the chance the problem ever surfaces.

First, turn on the duplication filter in Copilot, Cursor or any other coding assistant that offers one. The filter blocks suggestions that match training-data code above a similarity threshold. It does not eliminate near-identical output, but it does reduce the worst case. Confirm the setting is on in the developer's actual editor configuration, not just on the team account.

Second, run a licence scanner before deployment. Free tools include licensee, scancode-toolkit and ort (the OSS Review Toolkit). Commercial options include FOSSA, Snyk Licence and Black Duck. The scanner reads your package manifests and your source tree and flags licences that conflict with your distribution model. Running this once on the production bundle is more useful than running it never.

Third, if your developer is on paid Copilot Business or Enterprise, GitHub offers an IP indemnification commitment against third-party claims arising from Copilot output, conditional on the duplication filter being enabled. This is a meaningful contractual backstop, but it is conditional on the filter setting, limited to the named plans and verifiable only against the current terms before relying on it. Free Copilot, Cursor, Claude and Cody do not, as of May 2026, offer equivalent commitments.

Fourth, update your agency contract. Add a clause that the agency will not use AI-assisted code that incorporates GPL or AGPL output without explicit written notice to you, and that the agency warrants the delivered site does not infringe third-party licences. This does not protect you from the maintainer who notices. It does give you a route to push the cost back to the agency if a claim arises.

Fifth, keep a software bill of materials for your client-side bundle. Tools like cyclonedx-bom or the SBOM exports built into modern bundlers list every dependency and its licence. If a question arises in a year, having an SBOM from the release in question saves a week of work.

Our free compliance scan covers GDPR, cookies, accessibility and image rights on the live site. It does not check open-source licence compliance, which is a separate developer-tooling job. Treat the two as parallel tracks on the same site.

What changes on 9 December 2026

Directive (EU) 2024/2853, the new Product Liability Directive, treats software including AI systems as products from 9 December 2026. Article 4 brings AI tools into scope. Article 2(2) excludes open-source software developed outside a commercial activity, so the public open-source maintainer is not the defendant in a PLD claim. The commercial AI vendor is.

The relevance to AI-generated code is narrow. A small business harmed by a defective AI tool, for example where the AI emits code with a security flaw that leads to a data breach with downstream harm to a natural person, may have a new no-fault claim path against the AI vendor under the directive. The claim is for damage to natural persons, and it applies only to products placed on the market after 9 December 2026 in Member States that have transposed the directive. The PLD is not a route for the operator to recover a regulatory fine and it does not retroactively reach pre-existing tools. The Product Liability Directive in depth covers the scope and exclusions.

What this is not

This article is about open-source licence exposure when an AI writes code on your site. Three adjacent topics share words with this one.

Chatbot disclosure and AI-generated marketing copy labelling sit under Article 50 of the AI Act and AI-generated content, which is a separate regime from source-code copyright. The image side, where AI-generated illustrations or photographs may infringe, lives in AI-generated images on your site. The cookie-banner and accessibility version of the same liability chain is the broader GDPR and accessibility question. For the pre-AI image-letter version, see the Getty Images letter guide or our take on Copytrack and PicRights letters.

Common Questions

Does Copilot's duplication filter eliminate the risk?

No. The filter reduces the chance of verbatim reproduction of training-data code, which is the worst case. It does not address near-identical output that still resembles a specific open-source file. Treat the filter as risk reduction, not as a legal shield.

Am I liable if my freelancer used Cursor without telling me?

The site operator is the party distributing the code to visitors. An open-source maintainer who notices their code in your bundle writes to the domain owner. Your freelancer may owe you a fix under contract, but the public-facing exposure sits with you.

Does this apply to server-side code or just client-side?

Mostly client-side. Code that ships to the browser is distribution under GPL and triggers attribution and source-availability duties. Server-side code that never leaves your server is generally not GPL-distribution, except for AGPL, where network use counts as distribution under section 13.

Is there any AI coding tool that is safer than others?

Paid Copilot Business and Enterprise plans include an IP indemnity from GitHub when the duplication filter is enabled. No equivalent commitment is standard on Cursor, Claude, Cody or free Copilot tiers as of May 2026. Verify current terms before relying on any vendor promise.

Cluster pieces that pair with this one:

This article is technical analysis, not legal advice. The author is not your lawyer. For a binding view on a live licence question, talk to one.

Share this article