Denis Kovalchuk wrote on Mon, 30 Mar 2020 23:40 +0300: > > > > We do have an extra call when a duplicate rep is attempted to be > > inserted into the rep-cache. When does that happen? I know it will > > happen when called from build-repcache; what other use-cases trigger the > > codepath you changed? Committing two files with equal contents > > sequentially doesn't call svn_fs_fs__set_rep_reference() at all for the > > second file. > > > > I have spent some time examining when it happens and I see the following > cases: > 1) Commit multiple files with the same content in one revision. > 2) During commit, svn_fs_fs__set_rep_reference() is called for a property > representation of each node, regardless of whether it is a duplicate or > not. > 3) Commit a file with the same content as a property representation.
Case #2 sounds like it might benefit from optimization, for example, when importing a tree with files that have had auto-props set by «svn add». However, I wonder if we shouldn't optimize that case by making fewer calls to svn_fs_fs__set_rep_reference() in the first place, as we do for file content reps. All your test cases involve a single commit process. I think there may be other cases in which svn_fs_fs__set_rep_reference() will be called that involve multiple svn_fs_t handles; for example: . % head -c 1024 /dev/urandom > foo % svn import -mm foo $REPOS_URL/one % svn import -mm foo $REPOS_URL/two & % wait . if the two concurrent imports are served by separate svn_fs_t handles. (Call this case 4.) At any rate, it seems like svn_fs_fs__set_rep_reference() is far from being the bottleneck of commit operations, performance-wise. > > > What tests is this behaviour covered by? > > > > I think that: > - Case 1) is for example covered by the rep_sharing_effectiveness() test. > - Case 2) is implicitly covered by various tests for versioned property > changes. > - Case 3) is not currently covered by a specific test, but the situation > seems to be an edge case. > Fair enough. However, case 4 (commits through separate svn_fs_t handles) isn't obviously covered. > > > What scenario does this bring a performance improvement in? What is the > > improvement, quantitatively? > > > > I have compared the approaches with ‘FAIL’ and ‘IGNORE’ conflict resolution > algorithms using a synthetic benchmark. Inserting 10000 identical entries > into > a rep-cache.db which contains 250000 entries: > > 1) In the case of using the ‘FAIL’ algorithm: ~40000 microseconds. > > 2) In the case of using the ‘IGNORE’ algorithm: ~15000 microseconds. > So it takes half the time, but in absolute numbers the difference is barely perceptible, and in comparison to how long it would have taken to process 10000 filecontent deltas the difference is presumably negligible. > > Outside of the benchmark, there is ~15% improvement for the > ‘build-repcache’ command. Again, a 15% improvement in what use-case? Don't give us your conclusions for us to take on faith; give us the data so we can reach our own conclusions independently. > Based on this, I assume that there are other commands that may benefit from > this change. > Actually, I don't think this patch would be a big win for commands other than build-repcache. No use-case other than build-repcache has been identified where this patch would bring about a measurable difference to the duration of the overall operation. Even for build-repcache, you cited a 15% figure, but we don't know what data pattern you see that on so we don't know whether that figure is significant. Thanks for sharing the benchmark data and adding them to the log message. Cheers, Daniel