Re: [PATCH] rep-cache.db insert optimization (was: Re: [PATCH] svnadmin build-repcache command)

Daniel Shahaf Mon, 30 Mar 2020 17:11:12 -0700

Denis Kovalchuk wrote on Mon, 30 Mar 2020 23:40 +0300:
> >
> > We do have an extra call when a duplicate rep is attempted to be
> > inserted into the rep-cache.  When does that happen?  I know it will
> > happen when called from build-repcache; what other use-cases trigger the
> > codepath you changed?  Committing two files with equal contents
> > sequentially doesn't call svn_fs_fs__set_rep_reference() at all for the
> > second file.
> >  
> 
> I have spent some time examining when it happens and I see the following
> cases:
> 1) Commit multiple files with the same content in one revision.
> 2) During commit, svn_fs_fs__set_rep_reference() is called for a property
>     representation of each node, regardless of whether it is a duplicate or
> not.
> 3) Commit a file with the same content as a property representation.


Case #2 sounds like it might benefit from optimization, for example,
when importing a tree with files that have had auto-props set by «svn add».
However, I wonder if we shouldn't optimize that case by making fewer
calls to svn_fs_fs__set_rep_reference() in the first place, as we do for
file content reps.

All your test cases involve a single commit process.  I think there may
be other cases in which svn_fs_fs__set_rep_reference() will be called
that involve multiple svn_fs_t handles; for example:
.
    % head -c 1024 /dev/urandom > foo
    % svn import -mm foo $REPOS_URL/one
    % svn import -mm foo $REPOS_URL/two &
    % wait
.
if the two concurrent imports are served by separate svn_fs_t handles.
(Call this case 4.)

At any rate, it seems like svn_fs_fs__set_rep_reference() is far from
being the bottleneck of commit operations, performance-wise.

> 
> > What tests is this behaviour covered by?
> >  
> 
> I think that:
> - Case 1) is for example covered by the rep_sharing_effectiveness() test.
> - Case 2) is implicitly covered by various tests for versioned property
> changes.
> - Case 3) is not currently covered by a specific test, but the situation
> seems to be an edge case.
> 

Fair enough.

However, case 4 (commits through separate svn_fs_t handles) isn't
obviously covered.

> 
> > What scenario does this bring a performance improvement in?  What is the
> > improvement, quantitatively?
> >  
> 
> I have compared the approaches with ‘FAIL’ and ‘IGNORE’ conflict resolution
> algorithms using a synthetic benchmark. Inserting 10000 identical entries
> into
> a rep-cache.db which contains 250000 entries:
> 
> 1) In the case of using the ‘FAIL’ algorithm: ~40000 microseconds.
> 
> 2) In the case of using the ‘IGNORE’ algorithm: ~15000 microseconds.
> 

So it takes half the time, but in absolute numbers the difference is
barely perceptible, and in comparison to how long it would have taken to
process 10000 filecontent deltas the difference is presumably negligible.

> 
> Outside of the benchmark, there is ~15% improvement for the
> ‘build-repcache’ command.

Again, a 15% improvement in what use-case?  Don't give us your
conclusions for us to take on faith; give us the data so we can reach
our own conclusions independently.

> Based on this, I assume that there are other commands that may benefit from
> this change.
> 

Actually, I don't think this patch would be a big win for commands other
than build-repcache.  No use-case other than build-repcache has been
identified where this patch would bring about a measurable difference to
the duration of the overall operation.

Even for build-repcache, you cited a 15% figure, but we don't know what
data pattern you see that on so we don't know whether that figure is
significant.

Thanks for sharing the benchmark data and adding them to the log
message.

Cheers,

Daniel

Re: [PATCH] rep-cache.db insert optimization (was: Re: [PATCH] svnadmin build-repcache command)

Reply via email to