Bernd Schmidt <bernds_...@t-online.de>: > On 12/9/19 7:19 PM, Joseph Myers wrote: > > > > For any conversion we're clearly going to need to run various validation > > (comparing properties of the converted repository, such as contents at > > branch tips, with expected values of those properties based on the SVN > > repository) and fix issues shown up by that validation. reposurgeon has > > its own tools for such validation; I also intend to write some validation > > scripts myself. > > Would it be feasible to require that both conversions produce the same > output repository to some degree? Can we just look at release tags and > require that they have the same hash in both conversions, or are there good > reasons why the two would produce different outputs?
There are a couple of areas that could produce divergences. One is the part of the history before SVN was adopted. There's a lot of weird junk back there, artifacts from the cvs2svn conversion, that can produce issues like fundamntal uncertainty about where a child branch should actually be rooted on its parent. Reposurgeon makes choices that are a-priori reasonable in cases of doubt, but there are edge cases where a different conversion pipeline could make different ones. Another is how to translate tags. I don't know what Maxim's scripts do, but under reposurgeon a copy commit can have one of two dispositions: (1) Become a lightweight tag (git reference) if the tag comment looks like it was autogenerated and carries no real information. (2) Become a git annotated tag if we want to preserve the tag metadata (comment, date stamp) There's room for a certain amount of artistic license here. Most conversions have few enough disputable cases that the differences between renderings can be reviewed by eyeball. I'm not going to bet that will be true of this one. At the scale of this conversion, any form of comparative auditing is pretty hopeless. You get your assurance, if you get it, from believing the correctness of the conversion tool. Which is a major reason that reposurgeon has a *large* test suite. 98 general operations tests, 55 Subversion test dumps including a rogue's gallery of metadata perversions gathered from pervious conversions, and a cloud of surrounding auxiliary checks. -- <a href="http://www.catb.org/~esr/">Eric S. Raymond</a>