FSFS rep-cache validation

Philip Martin Wed, 22 Jan 2014 13:21:19 -0800

On IRC today we were discussing the validation code in
svn_fs_fs__set_rep_reference.  When inserting a row into the rep-cache
the insert can fail because the row already exists.  The usual way this
would happen is that two parallel commits add content with the
same checksum at different locations, e.g.:


svn import some_file file://`pwd`/repo/f1
svn import some_file file://`pwd`/repo/f2

That's harmless and nothing goes wrong, one of the commits populates the
rep-cache.

Another problem is that the only caller of svn_fs_fs__set_rep_reference
always passes reject_dup=FALSE so there is code that is never
exercised.  Perhaps we should remove it?

A more serious problem is if somebody edits a repository and ends up
with a rep-cache that has references to revisions that do not exist.
This could happen if somebody removes revision files and edits 'current'
but doesn't run 'recover'.  Or if somebody uses an old backup with a
newer rep-cache.  We have attempted to catch this by having
svn_fs_fs__get_rep_reference return an error if the rep-cache refers to
future revisions. However the code isn't working:

$ svnadmin create repo --compatible-version 1.8
$ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
$ chmod +w -R repo
$ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
$ echo 0 > repo/db/current
$ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g
$ svnadmin verify repo
* Verifying metadata at revision 0 ...
* Verifying repository metadata ...
svnadmin: E165011: No representation found at offset 276 for item 4 in revision 
1
svnadmin: E165011: Repository 'repo' failed to verify

Despite the verify error the new r1 is correct represented, it doesn't
use the erroneous row from the rep-cache, and "svn cat" will work.
However the commit also means that the erroneous row no longer refers to
a revision in the future and so a subsequent commit may use the row and
be corrupt.

The problem is that svn_fs_fs__get_rep_reference calls
svn_fs_fs__ensure_revision_exists which returns the error
SVN_ERR_FS_NO_SUCH_REVISION while the caller,
transaction.c:get_shared_rep, expects the error SVN_ERR_FS_CORRUPT.

We could change the error returned by svn_fs_fs__get_rep_reference or
the error expected by get_shared_rep.  Either of those would cause the
second commit above to fail.  However that doesn't really solve the
problem.  When the erroneous row is present we really need to stop any
commit that would make the erroneous row refer to a revision even if
that commit itself doesn't use the row:

$ svnadmin create repo --compatible-version 1.8
$ svnmucc -mm -U file://`pwd`/repo put repo/format f put repo/README.txt g
$ chmod +w -R repo
$ rm -f repo/db/revs/0/1* repo/db/revprops/0/1*
$ echo 0 > repo/db/current
$ svn -mm mkdir file://`pwd`/repo/X
$ svnmucc -mm -U file://`pwd`/repo put repo/README.txt g

Perhaps commit should always check for the highest revision in the
rep-cache and fail if it is greater than HEAD?  This might be considered
overkill to help people who have introduced corruption by editing the
repository.  However when such an erroneous row is present the failure
mode is nasty: a commit succeeds but is corrupt and content is lost.
The difficult bit here is that we do not have an SQLite index on
rep_cache.revision.

-- 
Philip

FSFS rep-cache validation

Reply via email to