On Mon, Oct 02, 2017 at 10:56:51AM -0400, Derrick Stolee wrote:
> +static void find_abbrev_len_for_pack(struct packed_git *p,
> + struct min_abbrev_data *mad)
> +{
> + int match = 0;
> + uint32_t num, last, first = 0;
> + struct object_id oid;
> +
> + open_pack_index(p);
> + num = p->num_objects;
> + last = num;
> + while (first < last) {
> + uint32_t mid = (first + last) / 2;
This can overflow if we are looking past 2^31. Probably not very common
(things seem to get pretty painful at hundreds of millions of objects).
But it can be written as:
uint32_t mid = lo + (hi - lo) / 2;
Sadly, this overflow case seems to be present in many of our binary
searches.
> + const unsigned char *current;
> + int cmp;
> +
> + current = nth_packed_object_sha1(p, mid);
nth_packed_object_sha1() has to do some minor computation to come up
with the correct pointer. The normal packfile lookup precomputes the
initial offset and stride outside the loop. I have no idea if that
micro-optimization is actually noticeable, but I thought I'd mention it
as a possible avenue for investigation since you're obviously interested
in squeezing out performance. ;)
-Peff