On Fri, 6 Dec 2019, Eric S. Raymond wrote:

> Reposurgeon has been used for several major conversions, including groff 
> and Emacs.  I don't mean to be nasty to Maxim, but I have not yet seen 
> *anybody* who thought they could get the job done with ad-hoc scripts 
> turn out to be correct.  Unfortunately, the costs of failure are often 
> well-hidden problems in the converted history that people trip over 
> months and years later.

I think the ad hoc script is the risk factor here as much as the fact that 
the ad hoc script makes limited use of git-svn.

For any conversion we're clearly going to need to run various validation 
(comparing properties of the converted repository, such as contents at 
branch tips, with expected values of those properties based on the SVN 
repository) and fix issues shown up by that validation.  reposurgeon has 
its own tools for such validation; I also intend to write some validation 
scripts myself.  And clearly we need to fix issues shown up by such 
validation - that's what various recent reposurgeon issues Richard and I 
have reported are about, fixing the most obvious issues that show up, 
which in turn will enable running more detailed validation.

The main risks are about issues that are less obvious in validation and so 
don't get fixed in that process.  There, if you're using an ad hoc script, 
the risks are essentially unknown.  But using a known conversion tool with 
an extensive testsuite, such as reposurgeon, gives confidence based on 
reposurgeon passing its own testsuite (once the SVN dump reader rewrite 
does so) that a wide range of potential conversion bugs, that might appear 
without showing up in the kinds of validation people try, are less likely 
because of all the regression tests for conversion issues seen in past 
conversions.  When using an ad hoc script specific to one conversion you 
lose that confidence that comes from a conversion tool having been used in 
previous conversions and having tests to ensure bugs found in those 
conversions don't come back.

I think we should fix whatever the remaining relevant bugs are in 
reposurgeon and do the conversion with reposurgeon being used to read and 
convert the SVN history and do any desired surgical operations on it.

Ad hoc scripts identifying specific proposed local changes to the 
repository content, such as the proposed commit message improvements from 
Richard or my branch parent fixes, to be performed with reposurgeon, seem 
a lot safer than ad hoc code doing the conversion itself.  And for 
validation, the more validation scripts people come up with the better.  
If anyone has or wishes to write custom scripts to analyze the SVN 
repository branch structure and turn that into verifiable assertions about 
what a git conversion should look like, rather than into directly 
generating a git repository or doing surgery on history, that helps us 
check a reposurgeon-converted repository in areas that might be 
problematic - and in that case it's OK for the custom script to have 
unknown bugs because issues it shows up are just pointing out places where 
the converted repository needs checking more carefully to decide whether 
there is a conversion bug or not.

-- 
Joseph S. Myers
jos...@codesourcery.com

Reply via email to