> On Dec 9, 2019, at 9:19 PM, Joseph Myers <jos...@codesourcery.com> wrote: > > On Fri, 6 Dec 2019, Eric S. Raymond wrote: > >> Reposurgeon has been used for several major conversions, including groff >> and Emacs. I don't mean to be nasty to Maxim, but I have not yet seen >> *anybody* who thought they could get the job done with ad-hoc scripts >> turn out to be correct. Unfortunately, the costs of failure are often >> well-hidden problems in the converted history that people trip over >> months and years later. > > I think the ad hoc script is the risk factor here as much as the fact that > the ad hoc script makes limited use of git-svn. > > For any conversion we're clearly going to need to run various validation > (comparing properties of the converted repository, such as contents at > branch tips, with expected values of those properties based on the SVN > repository) and fix issues shown up by that validation. reposurgeon has > its own tools for such validation; I also intend to write some validation > scripts myself. And clearly we need to fix issues shown up by such > validation - that's what various recent reposurgeon issues Richard and I > have reported are about, fixing the most obvious issues that show up, > which in turn will enable running more detailed validation. > > The main risks are about issues that are less obvious in validation and so > don't get fixed in that process. There, if you're using an ad hoc script, > the risks are essentially unknown. But using a known conversion tool with > an extensive testsuite, such as reposurgeon, gives confidence based on > reposurgeon passing its own testsuite (once the SVN dump reader rewrite > does so) that a wide range of potential conversion bugs, that might appear > without showing up in the kinds of validation people try, are less likely > because of all the regression tests for conversion issues seen in past > conversions. When using an ad hoc script specific to one conversion you > lose that confidence that comes from a conversion tool having been used in > previous conversions and having tests to ensure bugs found in those > conversions don't come back. > > I think we should fix whatever the remaining relevant bugs are in > reposurgeon and do the conversion with reposurgeon being used to read and > convert the SVN history and do any desired surgical operations on it. > > Ad hoc scripts identifying specific proposed local changes to the > repository content, such as the proposed commit message improvements from > Richard or my branch parent fixes, to be performed with reposurgeon, seem > a lot safer than ad hoc code doing the conversion itself. And for > validation, the more validation scripts people come up with the better. > If anyone has or wishes to write custom scripts to analyze the SVN > repository branch structure and turn that into verifiable assertions about > what a git conversion should look like, rather than into directly > generating a git repository or doing surgery on history, that helps us > check a reposurgeon-converted repository in areas that might be > problematic - and in that case it's OK for the custom script to have > unknown bugs because issues it shows up are just pointing out places where > the converted repository needs checking more carefully to decide whether > there is a conversion bug or not.
Firstly, I am not going to defend my svn-git-* scripts or the git-svn tool they are using. They are likely to have bugs and problems. I am, though, going to defend the conversion that these tools produced. No matter the conversion tool, all that matters is the final result. I have asked many times to scrutinize the git repository that I have uploaded several months ago and to point out any artifacts or mistakes. Surely, it can't be hard for one to find a mistake or two in my converted repository by comparing it against any other /better/ repository that one has. [FWIW, I am going to privately compare reposurgeon-generated repo that Richard E. uploaded against my repo. The results of such comparison can appear biased, so I'm not planning to publish them.] Secondly, the GCC community has overwhelmingly supported move to git, and in private conversations many developers have expressed the same view: 1. all we care about is history of trunk and recent release branches 2. current gcc-mirror is really all we need 3. having vendor branches and author info would be nice, but not so nice as to delay the switch any longer Granted, the above is not the /official/ consensus of GCC community, and I don't want to represent it as such. However, it is equally not the consensus of GCC community to delay the switch to git until we have a confirmed perfect repo. -- Maxim Kuvyrkov https://www.linaro.org