On Tue, May 01 2018, Derrick Stolee wrote:
> On 4/30/2018 6:07 PM, Ævar Arnfjörð Bjarmason wrote:
>> Since we show the commit data in the output that's nicely aligned once
>> we sort by object type. The decision to show tags before commits is
>> pretty arbitrary, but it's much less likely that we'll display a tag,
>> so if there is one it makes sense to show it first.
>
> Here's a non-arbitrary reason: the object types are ordered
> topologically (ignoring self-references):
>
> tag -> commit, tree, blob
> commit -> tree
> tree -> blob
Thanks. I'll add a patch with that comment to v2.
>> @@ -421,7 +451,12 @@ static int get_short_oid(const char *name, int len,
>> struct object_id *oid,
>> ds.fn = NULL;
>> advise(_("The candidates are:"));
>> - for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
>> + for_each_abbrev(ds.hex_pfx, collect_ambiguous, &collect);
>> + QSORT(collect.oid, collect.nr, sort_ambiguous);
>
> I was wondering how the old code sorted by SHA even when the ambiguous
> objects were loaded from different sources (multiple pack-files, loose
> objects). Turns out that for_each_abbrev() does its own sort after
> collecting the SHAs and then calls the given function pointer only
> once per distinct object. This avoids multiple instances of the same
> object, which may appear multiple times across pack-files.
>
> I only ask because now we are doing two sorts. I wonder if it would be
> more elegant to provide your sorting algorithm to for_each_abbrev()
> and let it call show_ambiguous_object as before.
>
> Another question is if we should use this sort generally for all calls
> to for_each_abbrev(). The only other case I see is in
> builtin/revparse.c.
When preparing v2 I realized how confusing this was, so I'd added this
to the commit message of my WIP re-roll which should explain this:
A note on the implementation: I started out with something much
simpler which just replaced oid_array_sort() in sha1-array.c with a
custom sort function before calling oid_array_for_each_unique(). But
then dumbly noticed that it doesn't work because the output function
was tangled up with the code added in fad6b9e590 ("for_each_abbrev:
drop duplicate objects", 2016-09-26) to ensure we don't display
duplicate objects.
That's why we're doing two passes here, first we need to sort the list
and de-duplicate the objects, then sort them in our custom order, and
finally output them without re-sorting them. I suppose we could also
make oid_array_for_each_unique() maintain a hashmap of emitted
objects, but that would increase its memory profile and wouldn't be
worth the complexity for this one-off use-case,
oid_array_for_each_unique() is used in many other places.