Joseph Myers <jos...@codesourcery.com>: > Hence my suggestion in <https://gcc.gnu.org/ml/gcc/2015-08/msg00150.html> > of reconverting and then combining with the existing git-svn history via > renaming all the refs in the existing git repository, so as to preserve > the validity of commit references and git-only branches there while having > the main copy of the history properly converted.
Sorry, but I can't even imagine how to recombine in that way with the tools I have. If you still think it's worth trying after seeing the reposurgeon conversion I deliver, we can investigate that I suppose. > I don't know what either git-svn or reposurgeon make of the times when > trunk was accidentally deleted and then recreated as an SVN copy of a > pre-deletion revision (what we want to avoid for the proper conversion is > those looking like deletion and recreation of all files in trunk - commits > that don't change the tree at all, or complete omission of the deletion > and subsequent recreation, would be fine). git-svn often fluffs that general kind of delete-recreate case pretty badly; reposurgeon's analyzer takes them in stride. I have a whole bunch of regression tests from pathological repos that I keep around to verify this. Another similar case is when a branch was created by a non-SVN copy followed by a commit, losing ancestry information - this is a relatively common operator error that reposurgeon had to learn to cope with early on. Most other translation tools (including git-svn) lose their cookies here. Hairballs like these are why reposurgeon has its own internal parser for the SVN dumpfile format, the only one that exists outside the SVN suite itself and the exception to the general rule that reposurgeon consumes the fast-import-stream output of exporters in order to read repositories. I couldn't achieve robustness in the presence of common metadata malformations in any less drastic way. > It was converted from CVS. More precisely, from two CVS repositories: the > gcc2 repository (1988-1999, starting as a collection of RCS files and with > not many files version controlled before 1992 and documentation not > version controlled for years after then), and what started as the EGCS > repository (1997-2005). The two repositories were combined by a custom > version of CVS (work done by Ian Taylor) to produce the input to cvs2svn. > gcc2 changes between the start of EGCS in 1997 and 1999 when development > in the gcc2 project ended were moved to /branches/premerge-fsf-branch as > part of the combination process (pre-EGCS gcc2 changes are on trunk). Uh oh. This sounds like it could be a recipe for serious grief. While Ian is certainly smart and persistent enough to have made something coherent out of that kind of mess, older versions of cvs2svn were defect amplifiers that would turn even minor metadata glitches in CVS into large tracts of scar tissue in the translated SVN, which in turn tend not to get noticed until you try to up-convert from the SVN. Cleaning up this kind of artifact was one of the major original motivations for reposurgeon. The fact that you had to *combine* CVS repositories hints that I may be about to encounter an entirely new class of malformations. Oh joy, oh rapture... :-( > A few branches in the repository that started as the EGCS repository, the > history of which branches was particularly messed up by rebasing (branch > tags having been moved from one revision to another, leaving behind > unnamed branches), were deliberately omitted from the conversion to SVN to > avoid it generating large amounts of very messy and not particularly > useful history in the resulting repository. I'll be glad not to have those problems... We'll know soon enough how bad things are. It's taken me the better part of three days to mirror the SVN, in part because your hosting site is randomly dropping connection once per several hours, but I'm now up to 208213 which is 91% close to the end. Once I have a complete mirror and can do a trial conversion, I'll be able to run a 'lint' command that is pretty good at finding cvs2svn conversion artifacts. I'll have to regenerate the empty contributor map, too. When I made the first one I didn't know that mirroring had been interrupted by a host timeout; I only had commits up to mid-2005. The GCC repo is pretty huge, but I've been hunting mastodons like it for years now - there's a row of trophy heads in the reposurgeon documentation. I ended up building a machine with a processor and cache specifically designed to handle non-parallelizable graph-theory computations multiple gigabytes wide - SMP is no help here and you want extra-large primary memory caches. On this hardware, conversion runs will merely be painfully slow rather than die-of-old-age interminable. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>