Hi all, I'm trying to dump a very large (300GB) svn repo so we can split it up into manageable sizes. However, I'm running into problems with revision references. I don't know if these are bugs in svn or if our repo somehow is corrupted, but here's an outline of the problems. Any help would be appreciated. Explaining the issues in full will take some space, so bear with me.
1. After filtering, there are Node-copyfrom-revs that refer to filtered out revisions. In the dumpfile, rev X of path A has Node-copyfrom-path B and Node-copyfrom-rev Y. However, rev Y didn't touch path B, so it's filtered out, and svnadmin load dies with "svnadmin: E160006: No such revision Z". (Note that it reports the nonexisting revision as Z, which is in fact *not* Y. It's the nearest existing rev < Y.) This is not a problem with the filter; the original dump of the repo contains the same references. Of course, there rev Y exists so there's no problem loading it. If this is a bug, it presumably originated when rev X was committed. This appears to be the same problem reported previously in http://svn.haxx.se/users/archive-2013-08/0560.shtml, but I saw no resolution to that post. I have attempted to bypass this problem by editing the dump stream to change Node-copyfrom-rev entries to nonexistent versions to instead point to the nearest, earlier, existing version. This seems to fix the problem. But then I run into the next: 2. svnadmin load crashes with failed IS_VALID_FORWARD_RANGE assertion in mergeinfo.c The next problem occurs when svnadmin load encounters a mergeinfo property. (The revisions in the mergeinfo properties also point to nonexistent revisions, just like Node-copyfrom-path above, so they have also been remapped to the nearest existing revs.) The mergeinfo encountered is "A:X-Y". When this is parsed in renumber_mergeinfo_revs, it turns into (X-1)-Y, because parse_rangelist substracts one from firstrev. I don't know what this -1 comes from, but when renumber_mergeinfo_revs calls get_revision_mapping on X-1 (which does not exist in the filtered stream) it gets a bogus rev. (It maps to the rev immediately preceding X in the dump stream.) Y, however, maps correctly to the number in the new loaded repo. This causes X>Y and the invalid range assertion fires. If I manually call get_revision_mapping(X), I get the correct rev for X in the loaded repo, so this is caused by the -1 in parse_rangelist. I see that if it gets an invalid rev from the mapping (which, according to the comment, happens for the revision immediately preceding the oldest revision in the stream) it uses the oldest rev from the load stream, minus 1. If this is because the start rev in the merge_range is actually one less than the start rev, it seems that rather than calling rev_from_map = get_revision_mapping(pb->rev_map, range->start); it needs to do rev_from_map = get_revision_mapping(pb->rev_map, range->start+1) -1; because while that may not matter if the revs are consecutive, if there are filtered out revs in the stream, they are not the same. If I make this change, the load appears to proceed OK. Please CC me when replying, as I'm not subscribed to this list. Regards, /Patrik J.