On Sat, 06 Apr 2024 at 15:54:51 +0200, kpcyrd wrote: > On 4/6/24 1:42 PM, Adrian Bunk wrote: > > You cannot simply proclaim that some git tree is the preferred form of > > modification without shipping said git tree in our ftp archive. > > > > If your claim was true, then Debian and downstreams would be violating > > licences like the GPL by not providing the preferred form of modification > > in the archive. > > I'm obviously not a lawyer, but I do think this is the case. Quoting from > GPL-3.0: > > > The “source code” for a work means the preferred form of the work for > > making modifications to it. “Object code” means any non-source form of a > > work. > > autotools pre-processed source code is clearly not "the preferred form of > the work for making modifications", which is specifically what I'm saying > Debian shouldn't consider a "source code input" either, to eliminate this > vector for underhanded tampering that Jia Tan has used. > > If we can force a future Jia Tan to commit their backdoor into git (for > everybody to see) I consider this a win.
I think maybe different people in this thread are talking about different things, and talking past each other as a result. There are two questions about what is the preferred form for modification, and I think perhaps not everyone agrees on which question they think they're answering. Which files are part of the source tree? ---------------------------------------- One question is: say you hand-write a file of one format (Autotools configure.ac and *.m4) and preprocess it into another format that, while technically editable, is not what you would genuinely edit unless you had no alternative (the Autotools ./configure script). What is acceptable source code for this file? Obviously if you don't have configure.ac, then you don't have the complete corresponding source code in the form you would want to use to make changes; so I think the answer has to include at least configure.ac, and there is an (IMO valid) argument that if configure.ac is missing, then what you have does not constitute source code. But, it is conventional for Autotools projects to ship the generated ./configure script *as well* (for example this is what `make dist` outputs), to allow the project to be compiled on systems that do not have the complete Autotools system installed. What we have traditionally said is that it's legitimate for the source code of a Debian package to include ./configure, as long as it *also* includes configure.ac. Indeed, if upstream does ship generated files in addition to the actual source code, we have traditionally said that Debian package maintainers "should, except where impossible for legal reasons, preserve the entire building and portability infrastructure provided by the upstream author" (<https://www.debian.org/doc/manuals/developers-reference/best-pkging-practices.en.html#repackaged-upstream-source>), It is legitimate to ask whether that rule's value exceeds its cost, or whether the value of deleting generated files and forcing them to be regenerated, as a "nothing up my sleeve" mechanism to make it harder for a future Jia Tan being able to sneak malicious things in via the `make dist` tarball, would be higher - but right now, we normally do ship both the source and the generated file, and I'm not aware of anyone claiming that that makes the result non-GPL-compliant. It's also relatively common for Autotools projects' `make dist` tarballs to omit some files that are part of the upstream git tree, such as VCS files like .gitignore, and ancillary/non-essential files like the configuration for Github Actions, Gitlab CI or equivalent. I think that's a valid thing to do (as long as they are not the source code for something in the dist tarball!) - and in fact omitting them reduces the number of files that a packager needs to review, therefore improving our chances of detecting the next backdoored module. So I think you're both partly right: we should insist on having the source code for every file we distribute as source, and in some ways it would make review easier if we deleted all files that are not source code (or even all files that are not required for our distro), but I don't agree that it is *necessarily* necessary for our source code archive to be identical to the upstream git tree. Note that I'm using "tree" as the git jargon term here: approximately "something that you could pack into a `git archive` tarball, losslessly". To go beyond that, we move on to the other question I can see here: Which commits are part of the source code? ------------------------------------------ Another question about the source code is whether it is sufficient to take a snapshot of the current state of the git tree (again, tree as jargon term) and say that it is the preferred form for modification, or whether complete corresponding source code should be understood to mean its complete git history going back to the beginning of the project (in git jargon, a series of commits going back to one without a parent, rather than a tree). I think that Guillem, and maybe Adrian too, whether rightly or wrongly, understood you to be claiming that a single snapshot (git tree or `git archive` output) is not enough, and the history is also required - and it's that assertion, which you might not have intended to be making, that they are pushing back most strongly against? (Or perhaps I'm misunderstanding.) If that's what is happening, then I agree with them. Demanding that we ship the full history is clearly not what was meant by the authors of the GPL. That surely can't be what the GPL was intended to mean, because at the time it was written, public VCSs were rare, and the GNU system was developed via a "cathedral" approach with a small number of authors writing software privately and releasing it to the world as a series of tarballs. It seems obvious to me that they wouldn't have written the license to require more a comprehensive version of "what is source?" than what they themselves were releasing. Demanding the fully history is also not really practical for a Free Software distribution, because a non-trivial project's history is inconveniently large, and over a long enough timescale it's relatively likely that someone has committed (and perhaps subsequently deleted) something that does not qualify as Free Software - either accidentally, or because they were assuming that it's OK to include non-Free documentation, artwork, test data or whatever, as long as it isn't executable code (which, rightly or wrongly, is not the position taken by Debian). Another practical concern is that Debian already has a legal review bottleneck: the time and effort needed for maintainers and the archive administrators to check that the entire source release contains only Free Software under an acceptable license is significant, and it's a major limiting factor on how much software we can ship. If we expanded the source release from "the source code as of today" to "all versions of the source code up to and including today", in projects with a non-trivial history that would dramatically increase the amount of time and effort that needs to be spent on review. As a result of this concern, the archive administrators have specifically disallowed the use of source package formats that contain history: only a moment-in-time snapshot (the equivalent of a git tree, not a series of git commits) is allowed. smcv