In <20090716151953.ge4...@wks0082.feds.uwaterloo.ca>, Eric Gerlach wrote: >On Wed, Jul 15, 2009 at 07:36:24AM -0700, Todd A. Jacobs wrote: >> On Mon, Jul 06, 2009 at 07:30:19PM -0500, Ron Johnson wrote: >> > How would one go about computing a *single* hash value for a complete >> > directory tree? >> >> You might want to look at how git does this. As I understand it, git >> stores hashes of trees, so the implementation may help you. > >Not really... the hash git indexes with is that of the compressed object > (which is either a blob, tree, or commit).
Actually, I'm fairly sure it hashes the uncompressed object (now[1]), but I'd have to dig in to the source code to be sure. > Tree and commit objects point > at other objects (which are also stored by hash). Blobs are the files > themselves. That is one way of calculating a single hash for a complete directory tree. The tree is identified by it's hash, which verifies the contents. The contents identify the "pointed to" objects by hash, which verifies their contents. Etc. The hash/sum calculated has the same verification properties as a single- file data-only hash. It *might* not be as cryptographically strong, but that would be a bit surprising and I've seen no papers/pages verifying or refuting it's strength.[2] -- Boyd Stephen Smith Jr. ,= ,-_-. =. b...@iguanasuicide.net ((_/)o o(\_)) ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-' http://iguanasuicide.net/ \_/ [1] There was a small period of time during Linus's maintainership of git that it hashed differently than it does now. I can't recall why or when it was changed. [2] Other than the fact that it uses a 128-bit SHA-1 hash and that *may* be getting too weak to be considered cryptographically secure in the near future. Using SHA-2 is probably better, and you shouldn't lose much strength by truncating at 128-bits if you need that size specifically, but git doesn't support that. Hopefully SHA-3 will be out before it matters, which means git can switch to that.[3] [3] If they ever decide to switch, it will probably be painful. They might not ever switch, since I don't think that resistance against attackers was the intent, just "identification" and resistance to random corruption. (CVS and SVN could be silently corrupted for years and it was virtually impossible to tell; that doesn't happen to git repositories.)
signature.asc
Description: This is a digitally signed message part.