Bernd Schmidt <bernds_...@t-online.de>:
> On 12/9/19 7:19 PM, Joseph Myers wrote:
> > 
> > For any conversion we're clearly going to need to run various validation
> > (comparing properties of the converted repository, such as contents at
> > branch tips, with expected values of those properties based on the SVN
> > repository) and fix issues shown up by that validation.  reposurgeon has
> > its own tools for such validation; I also intend to write some validation
> > scripts myself.
> 
> Would it be feasible to require that both conversions produce the same
> output repository to some degree? Can we just look at release tags and
> require that they have the same hash in both conversions, or are there good
> reasons why the two would produce different outputs?

There are a couple of areas that could produce divergences.

One is the part of the history before SVN was adopted. There's a lot of 
weird junk back there, artifacts from the cvs2svn conversion, that can produce
issues like fundamntal uncertainty about where a child branch should actually be
rooted on its parent.  Reposurgeon makes choices that are a-priori reasonable
in cases of doubt, but there are edge cases where a different conversion 
pipeline
could make different ones.

Another is how to translate tags. I don't know what Maxim's scripts do, but 
under reposurgeon a copy commit can have one of two dispositions:

(1) Become a lightweight tag (git reference) if the tag comment looks like 
it was autogenerated and carries no real information.

(2) Become a git annotated tag if we want to preserve the tag metadata (comment,
date stamp)

There's room for a certain amount of artistic license here.

Most conversions have few enough disputable cases that the differences between
renderings can be reviewed by eyeball. I'm not going to bet that will be true
of this one.  At the scale of this conversion, any form of comparative auditing
is pretty hopeless.  You get your assurance, if you get it, from believing
the correctness of the conversion tool.

Which is a major reason that reposurgeon has a *large* test suite. 98
general operations tests, 55 Subversion test dumps including a rogue's
gallery of metadata perversions gathered from pervious conversions,
and a cloud of surrounding auxiliary checks.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>


Reply via email to