First, thanks a lot for the offer of help; I'm happy to take you up on
it rather than do it all myself.
On 08/24/2015 12:54 PM, Joseph Myers wrote:
FWIW, Jason's own trial conversion with reposurgeon got up to at least
45GB memory consumption on a 32GB repository.
It ended up being about 65GB. Fortunately I regularly use a machine
with 128GB, so that isn't a big deal. And the trial conversion took
less than a day; I didn't get an exact time.
I'd like to use the --legacy flag so that old references to SVN commits
are easier to look up.
---
With respect to Joseph's point about periodic deletion and re-creation
of branches, it looks like reposurgeon dutifully models them as deletion
and re-creation of the entire tree, which is understandable but not
ideal. It also warns about these with, e.g.,
reposurgeon: mid-branch deleteall on refs/heads/master at <184996>.
Looking over the instances of this warning, it seems that in most cases
it was branch maintainers deciding to blow away the entire branch and
start over because svn mergeinfo had gotten too confused. I think in
all of these cases the right thing is to pretend that the
delete/recreate never happened.
---
Unfortunately, it looks like reposurgeon doesn't deal with gcc SVN's
subdirectory branches any better than git-svn. It does give a
diagnostic about them:
reposurgeon: branch links detected by file ops only: branches/suse/
branches/apple/ branches/st/ branches/gcj/ branches/csl/
branches/google/ branches/linaro/ branches/redhat/ branches/ARM/
tags/ix86/ branches/ubuntu/ branches/ix86/
though this is an incomplete list. There are also also branches/ibm,
branches/dead, tags/apple, tags/redhat, tags/csl, and tags/ubuntu.
Ideally the conversion tool would just recognize that these are
subdirectories containing branches rather than branches themselves.
Neither git-svn nor reposurgeon currently do that, they both just treat
them as one big branch. This is easy enough to fix after the fact with
git filter-branch:
https://gcc.gnu.org/wiki/GitMirror#Subdirectory_branches
but you might want to improve reposurgeon to handle this pattern directly.
Jason