Evgeny Kotkov wrote on Thu, May 21, 2015 at 18:23:22 +0300: > SVN_ERR(svn_repos_verify_fs2(repos, 1, 5, NULL, NULL, NULL, NULL, pool)); > > ...would return two different errors depending on the binaries being used, > assuming that one of the revisions in the [r1:r5] range is corrupted: > > (With 1.8.13) E160004: Checksum mismatch in item at offset 0 of > length 59 bytes in file path/asf/db/revs/0/2 > > (With 1.9.0-rc1) E165011: Repository 'path/asf' failed to verify ... > Thoughts?
I think the incumbent code doesn't use SVN_ERR_REPOS_CORRUPT (E165011) correctly. The keep-going implementation simply uses SVN_ERR_REPOS_CORRUPT to wrap whatever error the FS layer reported. That's wrong, for two reasons: - The SVN_ERR_REPOS_CORRUPT code should be used to indicate corruption of repos-layer on-disk data (such as $REPOS_DIR/db not existing or $REPOS_DIR/format being a directory). The code does not look for such conditions. In fact, since the repos layer got around to calling into the FS layer, the repos layer itself is almost certainly integral. - The repos layer has no business second-guessing the FS layer's choice of error code. For example, a permission error on a rev file could happen and does not indicate corruption, or a user might have pressed ^C during svn_fs_verify()'s execution. The incumbent code assumes that any non-SVN_ERR_CANCELLED code indicates a corruption; that assumption is wrong. So, the keep-going code should stop using SVN_ERR_REPOS_CORRUPT to indiscriminately wrap svn_fs_verify()'s error code. That should make the E160004 (SVN_ERR_FS_CORRUPT) error visible, addressing Evgeny's issue. (I am almost sure that's exactly what Evgeny's patch does.) Furthermore, I think the "if (found_corruption)" block at the end of svn_repos_verify_fs3() should also avoid using SVN_ERR_REPOS_CORRUPT, since that would be the wrong code to use in some circumstances (for example, if all revision files had permission errors). The code should either say what it _does_ know for a fact — "N out of M revisions failed to verify" — or start inspecting the FS errors to determine whether they indicate corruption or transient errors (which isn't always possible, since the repos layer does not know the context the error was raised in). In fact, since the aforementioned two uses of SVN_ERR_REPOS_CORRUPT are the only two uses of that code, I conclude that I am of the opinion that that error code should be deleted entirely from the 1.9 codebase. We can always revive it in 1.10 if needed. Cheers, Daniel