Antonio Russo <antonio.e.ru...@gmail.com> writes: > The way I see it, there are two options in handling a buildable package:
> 1. That file would have been considered a build artifact, consequently > removed and then regenerated. No backdoor. > 2. The file would not have been scrubbed, and a difference between the > git version and the released tar version would have been noticed. > Backdoor found. > Either of these is, in my mind, dramatically better than what happened. I think the point that you're assuming (probably because you quite reasonably think it's too obvious to need to be stated, but I'm not sure it's that obvious to everyone) is that malicious code injected via a commit is significantly easier to detect than malicious code that is only in the release tarball. This is not *always* correct; it really depends on how many eyes are on the upstream repository and how complex or unreadable the code upstream writes normally is. (For example, I am far from confident that I can eyeball the difference between valid and malicious procmail-style C code or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway. In other words, we should make sure that breaking the specific tactics *this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time, minor shift in attacker tactics. I *think* I'm mostly convinced that forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true. > Ok, so am I understanding you correctly in that you are saying: we do > actually want *some* build artifacts in the source archives? > If that's the case, could make those files at packaging time, analogous > to the DFSG-exclude stripping process? If I have followed this all correctly, I believe that in this case the exploit is not in a build artifact. It's in a very opaque source artifact that is different in the release tarball from the Git archive. Assuming that I have that right, stripping build artifacts wouldn't have done anything about this exploit, but comparing Git and release tarballs would have. I think you're here anticipating a *different* exploit that would be carried in build artifacts that Debian didn't remove and reconstruct, and that we want to remove those from our upstream source archives in order to ensure that we can't accidentally do that. > On 2024-03-29 22:41, Guillem Jover wrote: >> (For dpkg at least I'm pondering whether to play with switching to >> doing something equivalent to «git archive» though, but see above, or >> maybe generate two tarballs, a plain «git archive» and a portable one.) Yeah, with my upstream hat on, I'm considering something similar, but I still believe I have users who want to compile from source on systems without current autotools, so I still need separate release tarballs. Having to generate multiple release artifacts (and document them, and explain to people which ones they want, etc.) is certainly doable, but I can't say that I'm all that thrilled about it. I think with my upstream hat on I'd rather ship a clear manifest (checked into Git) that tells distributions which files in the distribution tarball are build artifacts, and guarantee that if you delete all of those files, the remaining tree should be byte-for-byte identical with the corresponding signed Git tag. (In other words, Guillem's suggestion.) Then I can continue to ship only one release artifact. > I take a look at these every year or so to keep me terrified of C! If > it's a single upstream developer, I absolutely agree, but if there's an > upstream community reviewing the git commits, I really do believe there > is hope (of them!) identifying bad(tm) things. A single upstream developer is the most common case, though. Perhaps less so for core libraries, but, well, there are plenty of examples. (To pick another one that comes readily to mind, zlib appears to only have one active maintainer.) The reality that we are struggling with is that the free software infrastructure on which much of computing runs is massively and painfully underfunded by society as a whole, and is almost entirely dependent on random people maintaining things in their free time because they find it fun, many of whom are close to burnout. This is, in many ways, the true root cause of this entire event. The sad irony here is that the xz maintainer tried to do exactly what we advise people in this situation to do: try to add a comaintainer to share the work, and don't block work because you don't have time to personally vet everything in detail. This is *exactly* why maintainers often don't want to do that, and thus force people to fork packages rather than join in maintaining the existing package. This is an aside, but this is why my personal policy for my own projects that I no longer have to maintain is to orphan them and require that someone fork them, not add additional contributors to my repository or release infrastructure. I do not have the resources to vet new maintainers -- if I had that time to spend on the projects, I wouldn't have orphaned them -- and therefore I want to explicitly disclaim any responsibility for what the new maintainer may do. Someone else will have to judge whether they are trustworthy. But I'm not sure that distributions are in a good position to do that *either*. > But, I will definitely concede that, had I seen a commit that changed > that line in the m4, there's a good chance my eyes would have glazed > over it. This is why I am somewhat skeptical that forcing everything into Git commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful, once you know something bad has happened, to be able to use all the Git tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be force-pushed, and if the Git repository is somewhere other than GitHub, the malicious possibilities are even broader. We could narrow those possibilities somewhat by maintaining Debian-controlled mirrors of upstream Git repositories so that we could detect rewritten history. (There are a whole lot of reasons why I think dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it on upstream's Git tree.) -- Russ Allbery (r...@debian.org) <https://www.eyrie.org/~eagle/>