Garance A Drosehn wrote on Wed, Mar 01, 2017 at 14:48:07 -0500: > On 1 Mar 2017, at 7:18, Daniel Shahaf wrote: > > > Stefan Sperling wrote on Wed, Mar 01, 2017 at 11:01:40 +0100: > >> On Tue, Feb 28, 2017 at 10:17:34PM -0600, Greg Stein wrote: > >>> I really like this idea. > >>> > >>> And we could take a copy of APR's sha1 code, and rejigger > >>> it to perform *both* hashes during the same scan of the > >>> raw bytes. I would expect the time taken to extend by > >>> (say) 1.1X rather than a full 2X. The inner loop might > >>> cost a bit more, but we'd only scan the bytes once. Very > >>> handy, when you're talking about megabytes in a stream-y > >>> environment. > >>> > >>> (and medium-term, push this dual-sha1 computation back > >>> into APR) > >> > >> The idea is nice and I would support its application. > > > > I would not. > > > > Using two variants of sha1 is fine for supporting webkit's > > use-case, or to protect admins from disgruntled employees > > who commit the known collision to DoS their employer. > > > > However, when the attacker model is a competent/resourceful > > attacker, using two variants of sha1 only increases the > > attack's complexity by a factor of about 2⁷ ([1, §4.1]), > > and might in fact cause collisions to be _easier_ to find > > (since additional state is output). > > > > So let's not add a variant of sha1. > > Note that is not a variant of sha1, which implies that it > is a *different* algorithm. It's the same algorithm, run > with two different inputs. > > The paper you reference: > https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf > > is interesting, and some of it might be going over my head. > However, I'll note that it seems to be about: > > "... unwanted dependencies between two slightly > different instances of the same hash function" > > That is not what I am suggesting. I am suggesting to use > the exact same hash function, but giving the function two > vastly different collections of data.
I understood what you were suggesting, and it's precisely what the paper talks about. Consider the function that takes as input a file (more accurately: a byte stream) and returns the sha1 of every other word of that file; that function is a hash function and is a variant of sha1. In terms of that paper, you can think of F as sha1 and G as "sha1 of every other word" (or the other way around). > And not only that, > but there is an ordering dependency between the two sets of > data. It's mighty hard to add data to the first set (the > entire file) without causing significant disruption in the > *actual data* that will be fed to the second set. > > I do not see how this extended-sha1 would be easier to break > than the current sha1, because it includes the current sha1, > unchanged. In cryptography, "it appears to be hard" and "I see no way to break it" don't count: the fact that one does not see an attack, does not mean that no attack exist. (Absence of evidence isn't evidence of absence.) What does count is security proofs ("No attacker with resources X can achieve Y in time T with probability exceeding 1/2**N"), and failing that, constructions that have stood the test of time. Nothing either of us comes up with off the top of his head is going to meet either of these criteria. Regarding "easier to break", you were probably thinking of collision attacks, which are never made easier by extending the output. I was thinking of "first preimage" attacks (finding a message that has a given checksum). Extending the output can make _these_ attacks easier. For example, if I told you my PIN is four decimal digits you wouldn't know what it was, but if I told you not only the number of digits but also that their sum was 36, you'd know the PIN was 9999. > And then it adds a second sha1 digest to that, > by effectively hashing a *different file* (the file created > by using only every-other-byte of the original file). > > Obviously this is also not perfect, but then there will > never be a hashing algorithm which can perfectly hash all > 1-megabyte files of arbitrary data into unique 256-byte > values. But it sure seems like it will make it much harder > to *calculate* a second file which will sum-up to the same > digest values (two values) as any given datafile. That's called a "second preimage attack". The extended construction is not secure against such an attack: a hash function with a 320-bit output is supposed to require O(2³²⁰) evaluations to find a second preimage, but it's trivial to find a second preimage for F and G simultaneously with only O(2¹⁶⁰) evaluations. (Do you see that?) A similar argument holds for collision attacks: a collision attack against a perfect hash function with a 320-bit output should have complexity 2¹⁶⁰; however, the paper provides an attack with complexity 2⁸⁸. There is also a trivial, but specialized to these particular F and G, attack with complexity 2⁸⁰. In both cases, the attacks cost as much as they do against sha1. The only real difference is that there haven't been 10 years of research about the security of this particular extended construction; that's a disadvantage. > Now as someone else noted, this idea would require a format > change to svn data structures,. So this would still have > to wait for the next new version of svn. But it might be > faster to use this tactic instead of making the much more > substantial change of totally dropping sha1 for sha256. > > (I don't mean it will necessarily be faster to implement, > but just that it might run faster in day-to-day operations) I concede the performance issue. Cheers, Daniel