On Fri, Mar 24, 2017 at 11:37:54PM -0700, Junio C Hamano wrote:
> The hash that names a packfile is constructed by sorting all the
> names of the objects contained in the packfile and running SHA-1
> hash over it. I think this MUST be hashed with collision-attack
> detection. A malicious site can feed you a packfile that contains
> objects the site crafts so that the sorted object names would result
> in a collision-attack, ending up with one pack that contains a sets
> of objects different from another pack that happens to have the same
> packname, causing Git to say "Ah, this new pack must have the same
> set of objects as the pack we already have" and discard it,
> resulting in lost objects and a corrupt repository with missing
> objects.
I don't think this case really matters for collision detection. What's
important is what Git does when it receives a brand-new packfile that
would overwrite an existing one. It _should_ keep the old one, under the
usual "existing data wins" rule.
It should be easy to test, though:
$ git init tmp && cd tmp
$ git commit --allow-empty -m foo
$ git gc
$ touch -d yesterday .git/objects/pack/*
$ ls -l .git/objects/pack
-r--r--r-- 1 peff peff 1128 Mar 25 02:10
pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.idx
-r--r--r-- 1 peff peff 153 Mar 25 02:10
pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.pack
$ git index-pack --stdin <.git/objects/pack/*.pack
pack 7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b
$ ls -l .git/objects/pack
-r--r--r-- 1 peff peff 1128 Mar 25 02:10
pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.idx
-r--r--r-- 1 peff peff 153 Mar 25 02:10
pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.pack
Looks like the timestamps were retained. <phew> And if we use strace, we
can see what happens:
$ strace git index-pac k--stdin <.git/objects/pack/*.pack
link(".git/objects/pack/tmp_pack_YSrdWU",
".git/objects/pack/pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.pack") = -1
EEXIST (File exists)
unlink(".git/objects/pack/tmp_pack_YSrdWU") = 0
link(".git/objects/pack/tmp_idx_O94NNU",
".git/objects/pack/pack-7e9d64ac27adc9ce1b12774dd287ff9bd8a9345b.idx") = -1
EEXIST (File exists)
unlink(".git/objects/pack/tmp_idx_O94NNU") = 0
This is due to the link()/EEXIST handling in finalize_object_file. It
has a FIXME for a collision check, so we could actually detect at that
point whether we have a real collision, or if the other side just
happened to send us the same pack.
I wouldn't be surprised if the dumb-http walker is not so careful,
though (but the solution is to make it careful, not to worry about
a weak hash algorithm).
The rest of your email all made sense to me.
-Peff