On IRC today we were discussing the validation code in svn_fs_fs__set_rep_reference. When inserting a row into the rep-cache the insert can fail because the row already exists. The usual way this would happen is that two parallel commits add content with the same checksum at different locations, e.g.:
svn import some_file file://`pwd`/repo/f1 svn import some_file file://`pwd`/repo/f2 That's harmless and nothing goes wrong, one of the commits populates the rep-cache. Another problem is that the only caller of svn_fs_fs__set_rep_reference always passes reject_dup=FALSE so there is code that is never exercised. Perhaps we should remove it? A more serious problem is if somebody edits a repository and ends up with a rep-cache that has references to revisions that do not exist. This could happen if somebody removes revision files and edits 'current' but doesn't run 'recover'. Or if somebody uses an old backup with a newer rep-cache. We have attempted to catch this by having svn_fs_fs__get_rep_reference return an error if the rep-cache refers to future revisions. However the code isn't working: $ svnadmin create repo --compatible-version 1.8 $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g $ chmod +w -R repo $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1* $ echo 0 > repo/db/current $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g $ svnadmin verify repo * Verifying metadata at revision 0 ... * Verifying repository metadata ... svnadmin: E165011: No representation found at offset 276 for item 4 in revision 1 svnadmin: E165011: Repository 'repo' failed to verify Despite the verify error the new r1 is correct represented, it doesn't use the erroneous row from the rep-cache, and "svn cat" will work. However the commit also means that the erroneous row no longer refers to a revision in the future and so a subsequent commit may use the row and be corrupt. The problem is that svn_fs_fs__get_rep_reference calls svn_fs_fs__ensure_revision_exists which returns the error SVN_ERR_FS_NO_SUCH_REVISION while the caller, transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT. We could change the error returned by svn_fs_fs__get_rep_reference or the error expected by get_shared_rep. Either of those would cause the second commit above to fail. However that doesn't really solve the problem. When the erroneous row is present we really need to stop any commit that would make the erroneous row refer to a revision even if that commit itself doesn't use the row: $ svnadmin create repo --compatible-version 1.8 $ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g $ chmod +w -R repo $ rm -f repo/db/revs/0/1* repo/db/revprops/0/1* $ echo 0 > repo/db/current $ svn -mm mkdir file://`pwd`/repo/X $ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g Perhaps commit should always check for the highest revision in the rep-cache and fail if it is greater than HEAD? This might be considered overkill to help people who have introduced corruption by editing the repository. However when such an erroneous row is present the failure mode is nasty: a commit succeeds but is corrupt and content is lost. The difficult bit here is that we do not have an SQLite index on rep_cache.revision. -- Philip