Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree

Duy Nguyen Fri, 10 Aug 2018 09:40:05 -0700

On Wed, Aug 8, 2018 at 8:46 PM Elijah Newren <[email protected]> wrote:
> > @@ -701,6 +702,24 @@ static int traverse_by_cache_tree(int pos, int 
> > nr_entries, int nr_names,
> >         if (!o->merge)
> >                 BUG("We need cache-tree to do this optimization");
> >
> > +       /*
> > +        * Try to keep add_index_entry() as fast as possible since
> > +        * we're going to do a lot of them.
> > +        *
> > +        * Skipping verify_path() should totally be safe because these
> > +        * paths are from the source index, which must have been
> > +        * verified.
> > +        *
> > +        * Skipping D/F and cache-tree validation checks is trickier
> > +        * because it assumes what n-merge code would do when all
> > +        * trees and the index are the same. We probably could just
> > +        * optimize those code instead (e.g. we don't invalidate that
> > +        * many cache-tree, but the searching for them is very
> > +        * expensive).
> > +        */
> > +       o->extra_add_index_flags = ADD_CACHE_SKIP_DFCHECK;
> > +       o->extra_add_index_flags |= ADD_CACHE_SKIP_VERIFY_PATH;
> > +
>
> In sum of this whole patch, you notice that the Nway_merge functions
> are still a bit of a bottleneck, but you know you have a special case
> where you want them to put an entry in the index that matches what is
> already there, so you try to set some extra flags to short-circuit
> part of their logic and get to what you know is the correct result.
>
> This seems a little scary to me.  I think it's probably safe as long
> as o->fn is one of {oneway_merge, twoway_merge, threeway_merge,
> bind_merge} (the cases you have in mind and which the current code
> uses), but the caller isn't limited to those.  Right now in
> diff-lib.c, there's a caller that has their own function, oneway_diff.
> More could be added in the future.
>
> If we're going to go this route, I think we should first check that
> o->fn is one of those known safe functions.  And if we're going that
> route, the comments I bring up on patch 2 about possibly avoiding
> call_unpack_fn() altogether might even obviate this patch while
> speeding things up more.


Yes I do need to check o->fn. I might have to think more about
avoiding call_unpack_fn(). Even if we avoid it though, we still go
through add_index_entry() and suffer the same checks every time unless
we do somethine like this (but then of course it's safer because
you're doing it in a specific x-way merge, not generic code like
this).

> > @@ -1561,6 +1581,13 @@ int unpack_trees(unsigned len, struct tree_desc *t, 
> > struct unpack_trees_options
> >                 if (!ret) {
> >                         if (!o->result.cache_tree)
> >                                 o->result.cache_tree = cache_tree();
> > +                       /*
> > +                        * TODO: Walk o.src_index->cache_tree, quickly check
> > +                        * if o->result.cache has the exact same content for
> > +                        * any valid cache-tree in o.src_index, then we can
> > +                        * just copy the cache-tree over instead of hashing 
> > a
> > +                        * new tree object.
> > +                        */
>
> Interesting.  I really don't know how cache_tree works...but if we
> avoided calling call_unpack_fn, and thus left the original index entry
> in place instead of replacing it with an equal one, would that as a
> side effect speed up the cache_tree_valid/cache_tree_update calls for
> us?  Or is there still work here?

Naah. Notice that we don't care at all about the source's cache-tree
when we update o->result one (and we never ever do anything about
o->result's cache-tree during the merge). Whether you invalidate or
not, o->result's cache-tree is always empty and you still have to
recreate all cache-tree in o->result. You essentially play full cost
of "git write-tree" here if I'm not mistaken.
-- 
Duy

Re: [PATCH v2 4/4] unpack-trees: cheaper index update when walking by cache-tree

Reply via email to