Hi,

On Fri, 2024-06-28 at 13:00 +0200, Didier 'OdyX' Raboud wrote:
> Le vendredi, 28 juin 2024, 08.32:43 h CEST Ansgar ๐Ÿ™€ a รฉcrit :
> > I'll expand on the here slightly for your benefit:
> > 
> > $ git clone https://salsa.debian.org/rra/tf5.git
> > [...]
> > $ apt-get source tf5
> > [...]
> > $ rm -rf tf5/.git tf5-5.0beta8/.pc
> > $ diff -Nur tf5 tf5-5.0beta8; echo $?
> > 0
> > 
> > If one is really bored:
> > $ (cd tf5; sha256sum $(find . -type f | sort) | sha256sum -)
> > 8d7820471fb44382a0c752319906064a1276ff18873fb4730dec1319aaf7b459ย  -
> > $ (cd tf5-5.0beta8; sha256sum $(find . -type f | sort) | sha256sum -)
> > 8d7820471fb44382a0c752319906064a1276ff18873fb4730dec1319aaf7b459ย  -
> > 
> > I will leave it as an exercise to you to compare the output and to
> > reason about results of different ways to compare both trees.
> 
> It looks to me that you have taken (by choice, or by chance) an example that 
> too conveniently fits what you want to demonstrate: in which the git 
> repository and the .dsc are treesame.

I was given this very specific example by Russ as an example where it
would not work and Sean continued to insist I rejected this specific
example as invalid by just asserting it doesn't demonstrate the problem
without any evidence.  I've tried to address Sean's concern by
expanding on my earlier reasoning.

> If I understand your position correctly (please correct me if needed): you 
> (with a ftpmaster hat) would like all uploads to come with a signed artefact 
> of hashes corresponding to the set of files as represented by the current 
> Debian source package format, as accepted by the archive today. And you would 
> like this artefact's signature be a signature by the human uploader. Did I 
> get 
> this right?

Ideally yes, though this might not work for all cases.

The next-best thing would be some reasonable representation of the
source package contents that could be signed by the submitter and
verified by the archive independently. (A signed artifact covering the
exact set of files/metadata would obviously satisfy this, but it can be
less than that.)

> If I understand dgit and tag2upload correctly, in the cases where the git 
> repository is treesame to the source package (patches-applied, with debian/
> patches file stored in git, as pointed by a tag), this artefact has the exact 
> same cryptographic value as the git tag, pointing to the git tree, pointing 
> to 
> the git objects (modulo the SHA-1 vs SHA-256 hash functions choice, which was 
> clarified elsewhere). One such example is the tf5 source that you used as 
> example.

That Russ used as an example; it doesn't come from me. I claim it is
not an interesting example.

> In that case, would you still want a outside-of-git hash, signed by 
> the human uploader?

That would be desirable for the archive (or third parties using data
from the archive) to be able to verify the contents are what the human
uploader signed as the archive does not see the Git tree.

> In the cases where the git repository is _not_ treesame to the source package 
> (patches-applied, but debian/patches not stored in git), uploads are already 
> possible via dgit push-source (and the human upload signature covers the 
> source package as it goes in the archive, not the git tree). In that other 
> case, would you still want a signed artifact of hashes, signed by the human 
> uploader?

Ideally yes, and this is where the "reasonable representation of the
source package contents" part gets relevant.

> And do we both understand that this means that some git repository 
> layouts would hence not be possible to be uploaded via tag2upload (because it 
> needs a much heavier git tag client, that builds the final source package, 
> hashes its contents, and creates the git tag)?

That is not obvious and probably not true. It is false for a trivial
reasonable representation of the source package contents, but not
necessarily for the set of all possible reasonable representations (and
there is no requirement to use a single fixed one either). (It
certainly isn't true if one leaves out the "reasonable" part.)

To understand whether a representation[1] exists that is also
reasonably easy to implement, one would need to look at it with an open
mind and, in particular, not with specific implementation details in
mind.

  [1]: Or small set of representations the upload mechanism/human
uploader can choose from.

Finding such a representation might be easier if one also changes some
possible outputs that tag2upload generates. For example, if one wanted
a reasonable representation to include patch names and contents in
d/patches, these would need to be known at the time the content hash
gets computed on the uploader's system for which limitations on
generated build artifacts might be useful. (But it could also be valid
to not include all details about d/patches in the reasonable
representation.)

It is probably hard to find such a representation if the source package
generation includes arbitrary common things, like generating
./configure, building changelog files from fragments, vendoring third
party sources (for which the exact contents are not known to the human
uploader directly).

Neither Russ Allbery's nor Matthias Urlichs' examples could convince me
that such a representation does not exist as I hope I have explained
sufficiently with my earlier reply.  The tag2upload developers blocking
any discussions for ~4 years also did not help to answer this question.

As for some examples of things that are pretty likely reasonable
representations:

1. Some hash over all files (upstream + debian/, including d/patches)
with patches applied.

2. Some hash over all files (upstream + debian/, including d/patches)
with patches unapplied.

3. Two separate hashes: unpatched upstream, debian/ (including
d/patches).

4. Two separate hashes: upstream tarball, debian/ (including
d/patches).

5. Hashes over unpatched upstream + normalized diff.

6. Hashes over upstream tarball + normalized diff.

7. Hashes over the source (and no d/patches generated as an artificial
build artifact no human uploader would look at anyway).

Possible reasonable representations might also include:

8. Some hash over all files, excluding d/patches. (This obviously
allows an entire equivalence class of patches in d/patches.)

9. Some hash over all files, excluding a single file in d/patches.
(Limits the equivalence class from (5.); the allowed file should
probably be required to be the last one in d/patches.)

10. Some hash over all files in the tree covered by the tag with the
tag2upload process indicating which files it added in addition to
those. (Possibly with limits over which files can be added.)

11. Weaker versions of (10.) that also allow modified or removed files.

These also seem easy to implement in a thin client and technically
already cover all possible workflows, including any not supported by
tag2upload (by (11.) without limits over which files can be added or
modified; though that probably leaves the "reasonable representation"
space). No variant seems to require a complete rewrite and/or full
redesign of how to tag2upload works (a claim that sometimes comes up).
Nor do they require to build a source package locally or replicate the
full functionality of dgit locally or require dpkg locally.

Note that various variants would also cover generated d/patches if one
feels a need for a higher variety of build artifacts.

To see whether there is a reasonable set of representations (such as a
subset of the ones above with agreed parameters) that covers a
reasonable set of workflows would of course require at least
willingness to communicate and some willingness to compromise (about
what is still a reasonable representation and what a thin client can
do).

And yes, that might also include that not all workflows will work. But
(a) tag2upload limits that too and (b) ftp-master limits that for
(automatic) binary uploads and (manual) source uploads via other paths
as well by imposing constraints such as no build-time modifications of
d/control, compliance with various policies (file locations, package
content, lintian checks, package naming, ...), requirements for buildd
configuration, hashes including in indices, ... for each of which I'm
sure a workflow could be found that is "blocked" by the current policy.

But claiming no such solution exists and demanding that current
requirements must be dropped while refusing to communicate about it[2],
then complaining about being "blocked" and everything getting blocked
because of "conservatism"[3] feels a bit dishonest and not like a good
faith approach.

  [2]: Claiming no solutions exists would require an understanding what
ftp-master thinks what a "reasonable representation" is. I find it very
optimistic that tag2upload developers know this without communication.

  [3]: Rejecting any suggestions of possible changes to deploy it might
qualify at that, but I wonder who the extremely conservative side is
then...

Ansgar

Reply via email to