Hi again. Thanks for the clarifications. Speaking personally I've found your replies encouraging, and I'm cautiously optimistic that this might be a workable approach. We'll keep working on a proper response.
In the meantime, I have a couple of questions. Joerg Jaspert writes ("Re: t2u in the archive"): > The intention is that enough gets uploaded and stored somewhere that dak > (or whoever later) can reconstruct what t2u did. And, obviously, if you > then follow the steps t2u does and use as input the shallow clone > (verified against the maintainers sig), it really should get identical > output. (Maybe minus timestamps, but for the important part). Firstly, you say a "shallow clone". It is not straightforward to include *precisely* the set of commits that are required to reproduce the output. The conversion might, in principle, go arbitrarily far into the maintainer's packaging branch; and, if the conversion involves an external tool such as git-debcherry, that tool probably won't currently report what commit(s) it used - so would need to be modified. I'm hoping the reason you say "shallow clone" is simply to avoid bloat. In that case, it's fairly simple: I find it difficult to imagine a future workflow that includes the history *of the upstream branch*. So the t2u server could exclude commits which are in the history of the nominated upstream tag. That would generally do the right thing, but it wouldn't *guarantee* not to include unwanted history. Would that be OK ? Secondly, the file listing. Thanks for the explanation. I'm still not quite sure we understand why you want it. Even so, I think I have a possible way to eliminate it, while still giving you the property that dak (or a future audit) can know the file list of the tree signed by the maintainer, without needing to actually run git. (I'm guessing that having dak not run git is why you don't think it's good enough that one can verify the contents directly from the git tag by running the git-ls-files rune.) The git tag is itself a Merkle tree, containing the information you need. So the hashes of all these things, and the filenames, are already signed by the maintainer - that's the git tag. The reason it's not readily verifiable without running git itself, is mostly because getting the actual object texts out of git is very complicated. How about we (the tag2upload team): * Have the git clone tarball contain the following - the tag itself - the tagged commit - the tagged tree objects (recursively) - the blobs as loose objects. * Provide a program, that given this information, recursively verifies the git hashes, and prints the list of files. That is, it does this: 1. find the commitid in the tag (textual parse) 2. find the commit object, as a file 3. calculate the git objectid of that file, by re-hashing it with sha1sum and the appropriate prefix, and checking that it matches 4. find the tree objectid in the commit object (textual parse) 5. repeat step 3 for the tree objectid 6. parse the tree object into a list of filenames, modes, and objectids 7. for each referenced objectid, find it, and rehash it 8. referenced objects must be trees (go to step 4) or blobs (now we know the path, print it). * When doing whatever verification this file list is for (I'm not sure if this is dak?), run that program to generate the file list directly from the maintainer's signed git tag, rather than using a separate copy plumbed through from the maintainer's system. The new listing program could be written in the language of your choice. (I'm volunteering to write it.) I think this program would be quite simple. It would get a bit more complicated when we want to support longer git hashes, but not by very much. It does *not* need to parse git pack files. The only git thing it needs to parse is the tree object format, which is binary, but really quite simple, and the textual metadata to find the commit and the tree. (It's possible that such a program exists already. I don't know, but let's assume for the sake of argument that we'll have to write it, for one reason or another.) What do you think of this idea ? > > The case of a repository that contains only the debian/* files > > poses another set of complications, but I don't think we have to > > get into that immediately. The above examples are probably enough > > to work through to understand what the intended semantics of this > > manifest is. > > I'm not entirely sure on what is best to require here. I mean, the orig > source has to be somewhere, including on the maints machine, so should > be possible to be included in this without any extra large magic. I think we could include it as a different ref in the git tarball. Thanks, Ian. -- Ian Jackson <ijack...@chiark.greenend.org.uk> These opinions are my own. Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.