>
> We do have an extra call when a duplicate rep is attempted to be
> inserted into the rep-cache. When does that happen? I know it will
> happen when called from build-repcache; what other use-cases trigger the
> codepath you changed? Committing two files with equal contents
> sequentially doesn't call svn_fs_fs__set_rep_reference() at all for the
> second file.
>
I have spent some time examining when it happens and I see the following
cases:
1) Commit multiple files with the same content in one revision.
2) During commit, svn_fs_fs__set_rep_reference() is called for a property
representation of each node, regardless of whether it is a duplicate or
not.
3) Commit a file with the same content as a property representation.
> What tests is this behaviour covered by?
>
I think that:
- Case 1) is for example covered by the rep_sharing_effectiveness() test.
- Case 2) is implicitly covered by various tests for versioned property
changes.
- Case 3) is not currently covered by a specific test, but the situation
seems to be
an edge case.
> What scenario does this bring a performance improvement in? What is the
> improvement, quantitatively?
>
I have compared the approaches with ‘FAIL’ and ‘IGNORE’ conflict resolution
algorithms using a synthetic benchmark. Inserting 10000 identical entries
into
a rep-cache.db which contains 250000 entries:
1) In the case of using the ‘FAIL’ algorithm: ~40000 microseconds.
2) In the case of using the ‘IGNORE’ algorithm: ~15000 microseconds.
Outside of the benchmark, there is ~15% improvement for the
‘build-repcache’ command.
Based on this, I assume that there are other commands that may benefit from
this change.
I have attached an updated patch.
Regards,
Denis Kovalchuk
rep-cache.db insert optimization.
Use the 'IGNORE' conflict resolution algorithm [1] for STMT_SET_REP
and remove SVN_ERR_SQLITE_CONSTRAINT handling, because the error
should not occur now. It brings a slight performance improvement,
because we remove an extra svn_fs_fs__get_rep_reference() call.
The result of a synthetic benchmark, where 10000 identical entries
are inserted into a rep-cache.db which contains 250000 entries:
1) In the case of using the 'FAIL' algorithm: ~40000 microseconds.
2) In the case of using the 'IGNORE' algorithm: ~15000 microseconds.
[1] https://www.sqlite.org/lang_conflict.html
* subversion/libsvn_fs_fs/rep-cache-db.sql
(STMT_SET_REP): Use the 'IGNORE' conflict resolution algorithm.
* subversion/libsvn_fs_fs/rep-cache.c
(svn_fs_fs__set_rep_reference): Remove SVN_ERR_SQLITE_CONSTRAINT handling.
Patch by: Denis Kovalchuk <[email protected]>
Index: subversion/libsvn_fs_fs/rep-cache-db.sql
===================================================================
--- subversion/libsvn_fs_fs/rep-cache-db.sql (revision 1875655)
+++ subversion/libsvn_fs_fs/rep-cache-db.sql (working copy)
@@ -61,7 +61,7 @@ WHERE hash = ?1
-- STMT_SET_REP
/* Works for both V1 and V2 schemas. */
-INSERT OR FAIL INTO rep_cache (hash, revision, offset, size, expanded_size)
+INSERT OR IGNORE INTO rep_cache (hash, revision, offset, size, expanded_size)
VALUES (?1, ?2, ?3, ?4, ?5)
-- STMT_GET_REPS_FOR_RANGE
Index: subversion/libsvn_fs_fs/rep-cache.c
===================================================================
--- subversion/libsvn_fs_fs/rep-cache.c (revision 1875655)
+++ subversion/libsvn_fs_fs/rep-cache.c (working copy)
@@ -328,7 +328,6 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
{
fs_fs_data_t *ffd = fs->fsap_data;
svn_sqlite__stmt_t *stmt;
- svn_error_t *err;
svn_checksum_t checksum;
checksum.kind = svn_checksum_sha1;
checksum.digest = rep->sha1_digest;
@@ -351,29 +350,8 @@ svn_fs_fs__set_rep_reference(svn_fs_t *fs,
(apr_int64_t) rep->size,
(apr_int64_t) rep->expanded_size));
- err = svn_sqlite__insert(NULL, stmt);
- if (err)
- {
- representation_t *old_rep;
+ SVN_ERR(svn_sqlite__insert(NULL, stmt));
- if (err->apr_err != SVN_ERR_SQLITE_CONSTRAINT)
- return svn_error_trace(err);
-
- svn_error_clear(err);
-
- /* Constraint failed so the mapping for SHA1_CHECKSUM->REP
- should exist. If so that's cool -- just do nothing. If not,
- that's a red flag! */
- SVN_ERR(svn_fs_fs__get_rep_reference(&old_rep, fs, &checksum, pool));
-
- if (!old_rep)
- {
- /* Something really odd at this point, we failed to insert the
- checksum AND failed to read an existing checksum. Do we need
- to flag this? */
- }
- }
-
return SVN_NO_ERROR;
}