Can coding agents relicense open source through a “clean room” implementation of code? — news

Can Coding Agents Enable Open Source Relicensing via AI 'Clean Room' Rewrites?

LONDON — A long-running dispute over the popular Python character encoding detection library chardet has thrust into the spotlight whether AI coding agents can create legally defensible "clean room" implementations that allow relicensing of open source software.

The controversy centers on chardet 7.0.0, released two days ago by longtime maintainer Dan Blanchard. The new version was described as a "ground-up, MIT-licensed rewrite" that maintains the same package name and public API as a drop-in replacement for previous versions, while being significantly faster and more accurate. Original creator Mark Pilgrim, who licensed the project under the LGPL in 2006, strongly disputes the maintainers' right to change the license.

The debate raises novel legal and ethical questions about the intersection of large language models and open source licensing that are likely to affect developers and the broader AI industry.

Background and the Relicensing Dispute

Chardet was created by Mark Pilgrim in 2006 under the GNU Lesser General Public License (LGPL). Pilgrim retired from public internet life in 2011, after which maintenance was taken over by Dan Blanchard, who has overseen every release since version 1.1 in July 2012.

In the release notes for version 7.0.0, Blanchard announced it as a complete rewrite under the more permissive MIT license. Pilgrim responded by opening GitHub issue #327, titled "No right to relicense this project."

Pilgrim acknowledged the contributions of current maintainers but argued that the maintainers "have no such right" to relicense, calling it "an explicit violation of the LGPL." He contended that licensed code, when modified, must remain under the same license, and dismissed the "complete rewrite" claim because the team had "ample exposure to the originally licensed code," meaning it could not qualify as a true clean room implementation. He specifically noted that "adding a fancy code generator into the mix does not somehow grant them any additional rights."

Blanchard's Defense and AI-Assisted Process

In a detailed reply, Blanchard acknowledged his extensive prior exposure to the codebase — having maintained it for over a decade — and admitted that a traditional clean room approach, which requires strict separation between those familiar with the original code and those implementing the new version, was not followed.

However, he argued that the purpose of clean room methodology is to ensure the resulting code is not a derivative work. "It is a means to an end, not the end itself," Blanchard wrote. "In this case, I can demonstrate that the end result is the same — the new code is structurally independent of the old code — through direct measurement rather than process guarantees alone."

To support his position, Blanchard presented results from the JPlag source code plagiarism detection tool. The analysis showed the new 7.0.0 release has a maximum similarity of just 1.29% with the previous release and 0.64% with version 1.1. By comparison, earlier releases showed similarities in the 80-93% range.

Blanchard provided transparency into the AI-assisted development process. He first used the "superpowers" brainstorming skill to create a detailed design document outlining the desired architecture. Critically, he "started in an empty repository with no access to the old source tree, and explicitly instructed Claude not to base anything on LGPL/GPL-licensed code." He then reviewed, tested, and iterated on every piece of the result using the AI coding assistant Claude.

Broader Implications and Expert Reactions

The dispute highlights how AI coding tools are dramatically lowering the barrier to creating what resemble clean room implementations. Traditionally a process that required multiple teams of engineers working over weeks or months — as in Compaq's famous 1982 clean-room clone of the IBM BIOS — coding agents can now achieve similar results in hours.

Simon Willison, in his analysis on simonwillison.net, notes that this pattern has become increasingly evident in recent months, pointing to his own experiments with the technique in December 2025.

The case also raises additional complexities around training data. As Willison and others have observed, Claude itself was very likely trained on chardet as part of its enormous training corpus, though there is no way to confirm this definitively. This creates questions about whether a model trained on a codebase can produce a morally or legally defensible clean room implementation.

The Free Software Foundation has been cautious in its response. Zoë Kooyman, executive director for the FSF, told The Register that the organization "can't comment on the specifics or legality of this particular project without doing additional research or consulting lawyers." However, she added that "there is nothing 'clean' about a Large Language Model (LLM) which has ingested the code it is being asked to reimplement."

Impact on Developers and the Industry

For developers, this case represents both opportunity and uncertainty. AI tools offer unprecedented ability to refactor, optimize, and modernize legacy open source projects. The ability to potentially relicense under more permissive terms could encourage broader adoption and commercial integration.

However, the legal ambiguity creates significant risk. Projects considering similar AI-driven rewrites must now weigh the potential for challenges from original authors or license stewards. Companies building products on top of open source dependencies may need to reconsider their reliance on projects that undergo such transformations.

The incident underscores growing tension in the open source community about AI's role in code generation. While some view these tools as accelerators for innovation, others see them as potential vehicles for circumventing established licensing protections.

What's Next

The chardet dispute is unlikely to be resolved quickly. Legal clarity around AI-assisted clean room implementations will likely require either community consensus, updated licensing guidelines from organizations like the FSF, or court precedents.

As AI coding agents continue to improve, similar relicensing attempts are expected to increase. The outcome of this specific disagreement — and any potential legal action — could set important precedents for how open source licenses interact with AI-generated code.

The Python packaging ecosystem and the wider open source community will be watching closely as this case develops.

Can coding agents relicense open source through a “clean room” implementation of code? — news

Sources

Original Source

Comments