First, thanks a lot for the offer of help; I'm happy to take you up on it rather than do it all myself.

On 08/24/2015 12:54 PM, Joseph Myers wrote:
FWIW, Jason's own trial conversion with reposurgeon got up to at least
45GB memory consumption on a 32GB repository.

It ended up being about 65GB. Fortunately I regularly use a machine with 128GB, so that isn't a big deal. And the trial conversion took less than a day; I didn't get an exact time.

I'd like to use the --legacy flag so that old references to SVN commits are easier to look up.

---

With respect to Joseph's point about periodic deletion and re-creation of branches, it looks like reposurgeon dutifully models them as deletion and re-creation of the entire tree, which is understandable but not ideal. It also warns about these with, e.g.,

  reposurgeon: mid-branch deleteall on refs/heads/master at <184996>.

Looking over the instances of this warning, it seems that in most cases it was branch maintainers deciding to blow away the entire branch and start over because svn mergeinfo had gotten too confused. I think in all of these cases the right thing is to pretend that the delete/recreate never happened.

---

Unfortunately, it looks like reposurgeon doesn't deal with gcc SVN's subdirectory branches any better than git-svn. It does give a diagnostic about them:

reposurgeon: branch links detected by file ops only: branches/suse/ branches/apple/ branches/st/ branches/gcj/ branches/csl/ branches/google/ branches/linaro/ branches/redhat/ branches/ARM/ tags/ix86/ branches/ubuntu/ branches/ix86/

though this is an incomplete list. There are also also branches/ibm, branches/dead, tags/apple, tags/redhat, tags/csl, and tags/ubuntu.

Ideally the conversion tool would just recognize that these are subdirectories containing branches rather than branches themselves. Neither git-svn nor reposurgeon currently do that, they both just treat them as one big branch. This is easy enough to fix after the fact with git filter-branch:

  https://gcc.gnu.org/wiki/GitMirror#Subdirectory_branches

but you might want to improve reposurgeon to handle this pattern directly.

Jason

Reply via email to