Hi, On Fri, 2024-06-28 at 13:00 +0200, Didier 'OdyX' Raboud wrote: > Le vendredi, 28 juin 2024, 08.32:43 h CEST Ansgar ๐ a รฉcrit : > > I'll expand on the here slightly for your benefit: > > > > $ git clone https://salsa.debian.org/rra/tf5.git > > [...] > > $ apt-get source tf5 > > [...] > > $ rm -rf tf5/.git tf5-5.0beta8/.pc > > $ diff -Nur tf5 tf5-5.0beta8; echo $? > > 0 > > > > If one is really bored: > > $ (cd tf5; sha256sum $(find . -type f | sort) | sha256sum -) > > 8d7820471fb44382a0c752319906064a1276ff18873fb4730dec1319aaf7b459ย - > > $ (cd tf5-5.0beta8; sha256sum $(find . -type f | sort) | sha256sum -) > > 8d7820471fb44382a0c752319906064a1276ff18873fb4730dec1319aaf7b459ย - > > > > I will leave it as an exercise to you to compare the output and to > > reason about results of different ways to compare both trees. > > It looks to me that you have taken (by choice, or by chance) an example that > too conveniently fits what you want to demonstrate: in which the git > repository and the .dsc are treesame.
I was given this very specific example by Russ as an example where it would not work and Sean continued to insist I rejected this specific example as invalid by just asserting it doesn't demonstrate the problem without any evidence. I've tried to address Sean's concern by expanding on my earlier reasoning. > If I understand your position correctly (please correct me if needed): you > (with a ftpmaster hat) would like all uploads to come with a signed artefact > of hashes corresponding to the set of files as represented by the current > Debian source package format, as accepted by the archive today. And you would > like this artefact's signature be a signature by the human uploader. Did I > get > this right? Ideally yes, though this might not work for all cases. The next-best thing would be some reasonable representation of the source package contents that could be signed by the submitter and verified by the archive independently. (A signed artifact covering the exact set of files/metadata would obviously satisfy this, but it can be less than that.) > If I understand dgit and tag2upload correctly, in the cases where the git > repository is treesame to the source package (patches-applied, with debian/ > patches file stored in git, as pointed by a tag), this artefact has the exact > same cryptographic value as the git tag, pointing to the git tree, pointing > to > the git objects (modulo the SHA-1 vs SHA-256 hash functions choice, which was > clarified elsewhere). One such example is the tf5 source that you used as > example. That Russ used as an example; it doesn't come from me. I claim it is not an interesting example. > In that case, would you still want a outside-of-git hash, signed by > the human uploader? That would be desirable for the archive (or third parties using data from the archive) to be able to verify the contents are what the human uploader signed as the archive does not see the Git tree. > In the cases where the git repository is _not_ treesame to the source package > (patches-applied, but debian/patches not stored in git), uploads are already > possible via dgit push-source (and the human upload signature covers the > source package as it goes in the archive, not the git tree). In that other > case, would you still want a signed artifact of hashes, signed by the human > uploader? Ideally yes, and this is where the "reasonable representation of the source package contents" part gets relevant. > And do we both understand that this means that some git repository > layouts would hence not be possible to be uploaded via tag2upload (because it > needs a much heavier git tag client, that builds the final source package, > hashes its contents, and creates the git tag)? That is not obvious and probably not true. It is false for a trivial reasonable representation of the source package contents, but not necessarily for the set of all possible reasonable representations (and there is no requirement to use a single fixed one either). (It certainly isn't true if one leaves out the "reasonable" part.) To understand whether a representation[1] exists that is also reasonably easy to implement, one would need to look at it with an open mind and, in particular, not with specific implementation details in mind. [1]: Or small set of representations the upload mechanism/human uploader can choose from. Finding such a representation might be easier if one also changes some possible outputs that tag2upload generates. For example, if one wanted a reasonable representation to include patch names and contents in d/patches, these would need to be known at the time the content hash gets computed on the uploader's system for which limitations on generated build artifacts might be useful. (But it could also be valid to not include all details about d/patches in the reasonable representation.) It is probably hard to find such a representation if the source package generation includes arbitrary common things, like generating ./configure, building changelog files from fragments, vendoring third party sources (for which the exact contents are not known to the human uploader directly). Neither Russ Allbery's nor Matthias Urlichs' examples could convince me that such a representation does not exist as I hope I have explained sufficiently with my earlier reply. The tag2upload developers blocking any discussions for ~4 years also did not help to answer this question. As for some examples of things that are pretty likely reasonable representations: 1. Some hash over all files (upstream + debian/, including d/patches) with patches applied. 2. Some hash over all files (upstream + debian/, including d/patches) with patches unapplied. 3. Two separate hashes: unpatched upstream, debian/ (including d/patches). 4. Two separate hashes: upstream tarball, debian/ (including d/patches). 5. Hashes over unpatched upstream + normalized diff. 6. Hashes over upstream tarball + normalized diff. 7. Hashes over the source (and no d/patches generated as an artificial build artifact no human uploader would look at anyway). Possible reasonable representations might also include: 8. Some hash over all files, excluding d/patches. (This obviously allows an entire equivalence class of patches in d/patches.) 9. Some hash over all files, excluding a single file in d/patches. (Limits the equivalence class from (5.); the allowed file should probably be required to be the last one in d/patches.) 10. Some hash over all files in the tree covered by the tag with the tag2upload process indicating which files it added in addition to those. (Possibly with limits over which files can be added.) 11. Weaker versions of (10.) that also allow modified or removed files. These also seem easy to implement in a thin client and technically already cover all possible workflows, including any not supported by tag2upload (by (11.) without limits over which files can be added or modified; though that probably leaves the "reasonable representation" space). No variant seems to require a complete rewrite and/or full redesign of how to tag2upload works (a claim that sometimes comes up). Nor do they require to build a source package locally or replicate the full functionality of dgit locally or require dpkg locally. Note that various variants would also cover generated d/patches if one feels a need for a higher variety of build artifacts. To see whether there is a reasonable set of representations (such as a subset of the ones above with agreed parameters) that covers a reasonable set of workflows would of course require at least willingness to communicate and some willingness to compromise (about what is still a reasonable representation and what a thin client can do). And yes, that might also include that not all workflows will work. But (a) tag2upload limits that too and (b) ftp-master limits that for (automatic) binary uploads and (manual) source uploads via other paths as well by imposing constraints such as no build-time modifications of d/control, compliance with various policies (file locations, package content, lintian checks, package naming, ...), requirements for buildd configuration, hashes including in indices, ... for each of which I'm sure a workflow could be found that is "blocked" by the current policy. But claiming no such solution exists and demanding that current requirements must be dropped while refusing to communicate about it[2], then complaining about being "blocked" and everything getting blocked because of "conservatism"[3] feels a bit dishonest and not like a good faith approach. [2]: Claiming no solutions exists would require an understanding what ftp-master thinks what a "reasonable representation" is. I find it very optimistic that tag2upload developers know this without communication. [3]: Rejecting any suggestions of possible changes to deploy it might qualify at that, but I wonder who the extremely conservative side is then... Ansgar