Re: t2u in the archive

Ian Jackson Mon, 01 Jul 2024 02:40:47 -0700

Hi again.  Thanks for the clarifications.  Speaking personally I've
found your replies encouraging, and I'm cautiously optimistic that
this might be a workable approach.  We'll keep working on a proper
response.



In the meantime, I have a couple of questions.

Joerg Jaspert writes ("Re: t2u in the archive"):
> The intention is that enough gets uploaded and stored somewhere that dak
> (or whoever later) can reconstruct what t2u did. And, obviously, if you
> then follow the steps t2u does and use as input the shallow clone
> (verified against the maintainers sig), it really should get identical
> output. (Maybe minus timestamps, but for the important part).

Firstly, you say a "shallow clone".

It is not straightforward to include *precisely* the set of commits
that are required to reproduce the output.  The conversion might, in
principle, go arbitrarily far into the maintainer's packaging branch;
and, if the conversion involves an external tool such as
git-debcherry, that tool probably won't currently report what
commit(s) it used - so would need to be modified.

I'm hoping the reason you say "shallow clone" is simply to avoid
bloat.

In that case, it's fairly simple: I find it difficult to imagine a
future workflow that includes the history *of the upstream branch*.
So the t2u server could exclude commits which are in the history of
the nominated upstream tag.  That would generally do the right thing,
but it wouldn't *guarantee* not to include unwanted history.  Would
that be OK ?


Secondly, the file listing.  Thanks for the explanation.  I'm still
not quite sure we understand why you want it.

Even so, I think I have a possible way to eliminate it, while still
giving you the property that dak (or a future audit) can know the file
list of the tree signed by the maintainer, without needing to actually
run git.

(I'm guessing that having dak not run git is why you don't think it's
good enough that one can verify the contents directly from the git tag
by running the git-ls-files rune.)

The git tag is itself a Merkle tree, containing the information you
need.  So the hashes of all these things, and the filenames, are
already signed by the maintainer - that's the git tag.  The reason
it's not readily verifiable without running git itself, is mostly
because getting the actual object texts out of git is very
complicated.

How about we (the tag2upload team):

 * Have the git clone tarball contain the following
    - the tag itself
    - the tagged commit
    - the tagged tree objects (recursively)
    - the blobs
   as loose objects.

 * Provide a program, that given this information, recursively
   verifies the git hashes, and prints the list of files.

   That is, it does this:

     1. find the commitid in the tag (textual parse)
     2. find the commit object, as a file
     3. calculate the git objectid of that file, by re-hashing
       it with sha1sum and the appropriate prefix,
       and checking that it matches
     4. find the tree objectid in the commit object (textual parse)
     5. repeat step 3 for the tree objectid
     6. parse the tree object into a list of filenames, modes,
        and objectids
     7. for each referenced objectid, find it, and rehash it
     8. referenced objects must be trees (go to step 4)
        or blobs (now we know the path, print it).

 * When doing whatever verification this file list is for (I'm not
   sure if this is dak?), run that program to generate the file list
   directly from the maintainer's signed git tag, rather than using a
   separate copy plumbed through from the maintainer's system.

The new listing program could be written in the language of your
choice.  (I'm volunteering to write it.)  I think this program would
be quite simple.  It would get a bit more complicated when we want to
support longer git hashes, but not by very much.  It does *not* need
to parse git pack files.  The only git thing it needs to parse is the
tree object format, which is binary, but really quite simple, and the
textual metadata to find the commit and the tree.

(It's possible that such a program exists already.  I don't know, but
let's assume for the sake of argument that we'll have to write it, for
one reason or another.)

What do you think of this idea ?


> > The case of a repository that contains only the debian/* files
> > poses another set of complications, but I don't think we have to
> > get into that immediately.  The above examples are probably enough
> > to work through to understand what the intended semantics of this
> > manifest is.
> 
> I'm not entirely sure on what is best to require here. I mean, the orig
> source has to be somewhere, including on the maints machine, so should
> be possible to be included in this without any extra large magic.

I think we could include it as a different ref in the git tarball.


Thanks,
Ian.

-- 
Ian Jackson <ijack...@chiark.greenend.org.uk>   These opinions are my own.  

Pronouns: they/he.  If I emailed you from @fyvzl.net or @evade.org.uk,
that is a private address which bypasses my fierce spamfilter.

Re: t2u in the archive

Reply via email to