On 17/09/2020 14.22, Nico Schottelius wrote: > > Thanks for the patch Rasmus. Overall it looks good to me, be aligned to > the stable patch submission rules makes sense. A tiny thing though: > > I did not calculate the exact collision probability with 12 characters
For reference, the math is something like this: Consider a repo with N+1 objects. We look at one specific object (for setlocalversion its the head commit being built, for the stable rules its whatever particular commit one is interested in for backporting), and want to know the probability that its sha1 collides with some other object in the first b bits (here b=48). Assuming the sha1s are independent and uniformly distributed, the probability of not colliding with one specific other commit is x=1-1/2^b, and the probability of not colliding with any of the other N commits is x^N, making the probability of a collision 1-x^N = (1-x)(1+x+x^2+...+x^{N-1}). Now the N terms in the second factor are very-close-to-but-slightly-smaller-than 1, so an upper bound for this probability is (1-x)N = N/2^b, which is also what one would naively expect. [This estimate is always valid, but it becomes a void statement of "the probability is less then 1" when N is >= 2^b]. So, assuming some vendor kernel repo that has all of Greg's stable.git (around 10M objects I think) and another 10M objects because random vendor, that works out to 20e6/2^48 = 7.1e-8, 71 ppb. > So I suggest you introduce something on the line of: > > ... > num_chars=12 > ... > --abbrev=$num_chars I considered that, but it becomes quite ugly since it needs to get into the awk script (as a 13, though perhaps we could get awk to do the +1, I don't really speak awk), where we'd then need to use " instead of ' and then escape the $ that are to be interpreted by awk and not the shell. So I think it's more readable with hardcoding and comments explaining why they are there; should anyone ever want to change 12. Rasmus