On Thu, Jan 23, 2014 at 11:00 AM, Evgeny Kotkov <evgeny.kot...@visualsvn.com > wrote:
> Hi, > > I've noticed that svnadmin recover and hotcopy commands are broken for old > repositories around trunk@1560210. The problem can be reproduced by > creating > a new --compatible-version=1.3 / 1.2 / 1.1 FSFS repository, doing something > simple with it (say, running svn mkdir or svn checkout / add file / > commit) and > attempting to recover or hotcopy it: > [[[ > (svnadmin-trunk recover REPOS) > svnadmin: E200002: Serialized hash missing terminator > > (svnadmin-trunk hotcopy REPOS REPOS_BACKUP) > svnadmin: E160006: No such revision 1 > ]]] > > Alternatively, this can be reproduced by running svnadmin_tests.py with, > for > example, --server-minor-version=3. This problem exists in both 1.8.x and > trunk. In 1.8.x, however, the errors for both hotcopy and recover are > visually > identical: > [[[ > (svnadmin-1.8.x recover REPOS) > svnadmin: E200002: Serialized hash missing terminator > > (svnadmin-1.8.x hotcopy REPOS REPOS_BACKUP) > svnadmin: E200002: Serialized hash missing terminator > ]]] > > Overall, this looks like a regression since 1.7.x, where hotcopy works just > fine. The 1.7.x recover also fails for these repositories with an > "svnadmin: E000002: Can't open file 'X/db/revs/2': No such file or > directory" > error, but this is a well-known issue 4213 [1], which should be fixed by > r1367683 [2]. > > Being unable to recover and hotcopy old repositories with the most recent > svnadmin version seems like an important issue, so I think it's worth a > couple > of tests. [See the attached patch #1] > > I did a quick investigation and spotted a couple of odd moments in the > related code: > > 1) First of all, there is a reason for the different error messages in > trunk > and in the 1.8.x branch. > > This is a regression introduced in r1503742 [3], because now calling the > svn_fs_fs__find_max_ids() routine during hotcopy creates a cyclic > dependency. Hotcopy needs to update the db/current file, so it uses > this > routine to get the NEXT_NODE_ID / NEXT_COPY_ID for old repositories. > However, this currently behaves the following way: > > hotcopy_update_current() > ... > (wants to update the db/current file, but doesn't know the next ids) > > svn_fs_fs__find_max_ids() > svn_fs_fs__rev_get_root() > svn_fs_fs__ensure_revision_exists() > ... > (reads the most recent db/current file in order to check the > presence > of this revision. it is reported missing, because we're right > in the > middle of updating the db/current file and cannot do that yet, > because we still do not know the next ids) > > So, basically, we're calling a routine that assumes the db/current file > is > up-to-date in order to get the information on how to update the outdated > db/current file. This ends with an "E160006: No such revision 1" error. > > 2) The reason for the "hash missing terminator" error is a bit trickier. > > The NEXT_NODE_ID / NEXT_COPY_ID recovery code recursively crawls > through the dir representations in the db, however, it assumes that the > EXPANDED_SIZE in the rep description always is non-zero. According to > the docstring in subversion/libsvn_fs_fs/fs.h, this assumption is > incorrect: > [[[ > /* The size of the representation in bytes as seen in the revision > file. */ > svn_filesize_t size; > > /* The size of the fulltext of the representation. If this is 0, > * the fulltext size is equal to representation size in the rev file, > */ > svn_filesize_t expanded_size; > ]]] > > I did a small experiment with different svn (1.7 / 1.8 / trunk) versions > writing dir reps to repositories with different --compatible-versions... > [[[ > # text: REVISION OFFSET SIZE EXPANDED_SIZE > svn 1.7, --compatible-version=1.3 repository (db/revs/1) > text: 1 57 28 28 > svn 1.8, --compatible-version=1.3 repository (db/revs/1) > text: 1 57 28 0 > ]]] > > ...and it looks like svn 1.7.x did have the EXPANDED_SIZE equal to > SIZE, but > this behavior changed with 1.8.x. In 1.8.x, EXPANDED_SIZE is set to 0 > for > these representations, so, attemping to read them ends with an > "E200002: Serialized hash missing terminator" error. > > See the attached 'rep-description-dumps' file. > > NOTE: This code is a part of the recover_find_max_ids() routine and, > worth > mentioning, has some history. There is r1404675 [4], which added the > "pick > the right one from EXPANDED_SIZE and SIZE" logic, however, it was > reverted > in r1404708 [5]. Unfortunately, I do not yet understand the reasoning > behind this revert, so, I guess, I'll just have to continue > investigating. > > 3) There is an "uniquifier" concept for the representations that makes the > rep > caching possible. Each uniquifier requires a new NODE_ID, so, for > example, > a transaction which requires 5 new NODE_IDs ends up with taking 10 of > them. > This assumption is not shared by the recovery procedure, which recovers > the > NEXT_NODE_ID / NEXT_COPY_ID for old repositories by finding the maximum > NODE_ID / COPY_ID and bumping them by one. > > For example, for the old format Greek tree repository with db/current > containing "1 14 1" the recovery procedure will replace the db/current > contents with "1 13 1". '13' in this example is the recovered > NEW_NODE_ID > (but it also is the NODE_ID reserved for the uniquifier of the last > "real" > NODE_ID). As a consequence, hotcopying this repository (which utilizes > the > recover_find_max_ids() routine) will end up with *mismatching > db/current* > files in the hotcopy source and destination. > > It looks like making these uniquifiers for old repositories is > unnecessary > in the first place (they are only required for rep caching, which is > only > available for the newer >= SVN_FS_FS__MIN_REP_SHARING_FORMAT > repositories). > > See set_uniquifier() in subversion/libsvn_fs_fs/transaction.c > > 4) The hotcopy_update_current() does something quite odd for old > repositories. > > It grabs the MAX_NODE_ID / MAX_COPY_ID from svn_fs_fs__find_max_ids(), > claims them to be the NEXT_NODE_ID / NEXT_COPY_ID and writes them into > the > db/current file. The maximum values found by the > svn_fs_fs__find_max_ids() > routine should be bumped by one (the same way it is done in > recover_body(), > subversion/libsvn_fs_fs/recovery.c). > > Not doing so will result in the hotcopy destination having incorrect > NEXT_NODE_ID and NEXT_COPY_ID in the db/current file (pointing to > already taken ids). > > If this might help to understand what is going on, I've made a > quick-and-dirty > patch for 1.8.x branch, which explicitly points the problems and rushes > toward > the green test suite. Doing this for 1.8.x allows to postpone the problem > 1) > and focus on 2-4), because, as far as I suspect, fixing 1) might get a bit > messy. [See the attached patch #2] > > An appropriate fix for this issue undoubtedly requires a lot more work. > I think I might be able to come up with a patch for the trunk in the nearby > future (in case someone does not fix it earlier). > > [1] http://subversion.tigris.org/issues/show_bug.cgi?id=4213 > [2] http://svn.apache.org/viewvc?view=revision&revision=r1367683 > [3] http://svn.apache.org/viewvc?view=revision&revision=r1503742 > [4] http://svn.apache.org/viewvc?view=revision&revision=r1404675 > [5] http://svn.apache.org/viewvc?view=revision&revision=r1404708 > > > Thanks and regards, > Evgeny Kotkov > Hi Evgeny, Thanks for the patches. I've been digging into the code to figure out how and why the size==0 came about. But I'll not be able to finish that bit before Saturday. -- Stefan^2.