Here I can offer only the cursory examination, as I don't know this area
of code in question.

Derrick Stolee <dsto...@microsoft.com> writes:

> A commit A can reach a commit B only if the generation number of A
> is larger than the generation number of B. This condition allows
> significantly short-circuiting commit-graph walks.
>
> Use generation number for 'git tag --contains' queries.
>
> On a copy of the Linux repository where HEAD is containd in v4.13
> but no earlier tag, the command 'git tag --contains HEAD' had the
> following peformance improvement:
>
> Before: 0.81s
> After:  0.04s
> Rel %:  -95%

A question: what is the performance after if the "commit-graph" feature
is disabled, or there is no commit-graph file?  Is there performance
regression in this case, or is the difference negligible?

>
> Helped-by: Jeff King <p...@peff.net>
> Signed-off-by: Derrick Stolee <dsto...@microsoft.com>
> ---
>  ref-filter.c | 23 +++++++++++++++++++----
>  1 file changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/ref-filter.c b/ref-filter.c
> index cffd8bf3ce..e2fea6d635 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1587,7 +1587,8 @@ static int in_commit_list(const struct commit_list 
> *want, struct commit *c)
>  /*
>   * Test whether the candidate or one of its parents is contained in the list.
                                 ^^^^^^^^^^^^^^^^^^^^^

Sidenote: when examining the code after the change, I have noticed that
the above part of commit header for the comtains_test() function is no
longer entirely correct, as the function only checks the candidate
commit, and in no place it access its parents.

But that is not your problem.

>   * Do not recurse to find out, though, but return -1 if inconclusive.
>   */
>  static enum contains_result contains_test(struct commit *candidate,
>                                         const struct commit_list *want,
> -                                       struct contains_cache *cache)
> +                                       struct contains_cache *cache,
> +                                       uint32_t cutoff)
>  {
>       enum contains_result *cached = contains_cache_at(cache, candidate);
>  
> @@ -1603,6 +1604,10 @@ static enum contains_result contains_test(struct 
> commit *candidate,
>  
>       /* Otherwise, we don't know; prepare to recurse */
>       parse_commit_or_die(candidate);
> +
> +     if (candidate->generation < cutoff)
> +             return CONTAINS_NO;
> +

Looks good to me.

The only [minor] question may be whether to define separate type for
generation numbers, and whether to future proof the tests - though the
latter would be almost certainly overengineering, and the former
probablt too.

>       return CONTAINS_UNKNOWN;
>  }
>  
> @@ -1618,8 +1623,18 @@ static enum contains_result contains_tag_algo(struct 
> commit *candidate,
>                                             struct contains_cache *cache)
>  {
>       struct contains_stack contains_stack = { 0, 0, NULL };
> -     enum contains_result result = contains_test(candidate, want, cache);
> +     enum contains_result result;
> +     uint32_t cutoff = GENERATION_NUMBER_INFINITY;
> +     const struct commit_list *p;
> +
> +     for (p = want; p; p = p->next) {
> +             struct commit *c = p->item;
> +             parse_commit_or_die(c);
> +             if (c->generation < cutoff)
> +                     cutoff = c->generation;
> +     }

Sholdn't the above be made conditional on the ability to get generation
numbers from the commit-graph file (feature is turned on and file
exists)?  Otherwise here after the change contains_tag_algo() now parses
each commit in 'want', which I think was not done previously.

With commit-graph file parsing is [probably] cheap.  Without it, not
necessary.

But I might be worrying about nothing.

>  
> +     result = contains_test(candidate, want, cache, cutoff);

Other than the question about possible performace regression if
commit-graph data is not available, it looks good to me.

>       if (result != CONTAINS_UNKNOWN)
>               return result;
>  
> @@ -1637,7 +1652,7 @@ static enum contains_result contains_tag_algo(struct 
> commit *candidate,
>                * If we just popped the stack, parents->item has been marked,
>                * therefore contains_test will return a meaningful yes/no.
>                */
> -             else switch (contains_test(parents->item, want, cache)) {
> +             else switch (contains_test(parents->item, want, cache, cutoff)) 
> {
>               case CONTAINS_YES:
>                       *contains_cache_at(cache, commit) = CONTAINS_YES;
>                       contains_stack.nr--;
> @@ -1651,7 +1666,7 @@ static enum contains_result contains_tag_algo(struct 
> commit *candidate,
>               }
>       }
>       free(contains_stack.contains_stack);
> -     return contains_test(candidate, want, cache);
> +     return contains_test(candidate, want, cache, cutoff);

Simple change. It looks good to me.

>  }
>  
>  static int commit_contains(struct ref_filter *filter, struct commit *commit,

Reply via email to