Hi, Ludovic Courtès <l...@gnu.org> writes:
> zimoun <zimon.touto...@gmail.com> skribis: > >> Giving a look at Disarchive, I found how to compute Git-based >> serialization hash and somehow serialization methods of "guix hash" >> needs some clearning; considering '--recursive' is 'nar' serialization >> which is a better name. Anyway, see [1]. :-) > > Neat! > >> I would like to add SWH-based serialization hash but I do not find if >> a function already does the hard work. Any pointer? > > I think it’s ‘git-hash-directory’ in (disarchive git-hash). That’s the one. I only know what SWH does for a few cases: • directory: Use their version of ‘git-hash-directory’. • file: Use their version of ‘git-hash-file’ (resulting in a ID like “swh:1:cnt:...”). I don’t know if they ingest regular files like this, but if they ingested the file through another means, it will have that ID. • git: Read the directory ID from the Git database. This is essentially ‘git rev-parse HEAD:’, where the colon at the end tells Git to get the “tree” (directory) ID rather than the commit ID. (I’m not sure if guile-git supports this; so far I’ve just been shelling out to Git.) • hg: Use their version of ‘git-hash-directory’ excluding the “.hg” directory. In my work, I’ve been strict about keeping the Git directory IDs based on the Git database (“.git”) rather than computing them using ‘git-hash-directory’. Since Guix deletes the Git database before putting a checkout in the store, that option may not be available to you (unless you download the repository again). I’m not sure how much of problem this would be in practice. There may be a few edge cases with submodules and “.gitattributes” to watch out for. My guess is that as it stands, if a repo has a “.gitattributes” file, running ‘git-hash-directory’ on the checkout will produce a directory ID that SWH doesn’t have (they will ignore it, but we will include it). A corollary of this guess is that our SWH fallback code for Git will fail for a repo that has a “.gitattributes” file, since we include it in the nar hash, but SWH will not provide it. (I say “guess” because this is based on some stuff I observed when writing that procedure several months ago – I haven’t verified any of this. See also <https://issues.guix.gnu.org/48540>, which is the same problem but with submodules instead of “.gitattributes”). Sorry but all I have to offer is doom and gloom on this one. :( You might be able to get ‘git-hash-directory’ to work well enough on the Git checkouts that Guix puts in the store, but you’ll have to be careful! > The other day I learned that the Git CLI ignores empty directories, but > the Git format itself has nothing against empty directories. Thus SWH > serializes in exactly the same way as Git. > > (Can you confirm, Timothy?) I can confirm that a Git tree node of the form 40000 empty-directory 4b825dc642cb6eb9a060e54bf8d69288fbee4904 theoretically represents an empty directory named “empty-directory”. The hash is computed like this: $ printf 'tree 0\0' | sha1sum 4b825dc642cb6eb9a060e54bf8d69288fbee4904 - I don’t know anything about where Git excludes this or what would happen if you manually constructed a Git repo with empty directories, though! -- Tim