Hi Everyone,

I've been swamped with other projects for most of June, which gave me time to 
digest all the feedback I've got on GCC's conversion from SVN to Git.

The scripts have heavily evolved from the initial version posted here.  They 
have become fairly generic in that they have no implied knowledge about GCC's 
repo structure.  Due to this I no longer plan to merge them into GCC tree, but 
rather publish as a separate project on github.  For now, you can track the 
current [hairy] version at https://review.linaro.org/c/toolchain/gcc/+/31416 .

The initial version of scripts used heuristics to construct branch tree, which 
turned out to be error-prone.  The current version parse entire history of SVN 
repo to detect all trees that start at /trunk@1.  Therefore all branches in the 
converted repo converge to the same parent at the beginning of their histories.

As far as GCC conversion goes, below is what I plan to do and what not to do.  
This is based on comments from everyone in this thread:

1. Construct GCC's git repo from SVN using same settings as current git mirror.
2. Compare the resulting git repo with current GCC mirror -- they should match 
on the commit hash level for trunk, branches/gcc-*-branch, and other "normal" 
branches.
3. Investigate any differences between converted GCC repo and current GCC 
mirror.  These can be due to bugs in git-svn or other misconfigurations.
4. Import git-only branches from current GCC mirror.
5. Publish this "raw" repo for community to sanity-check its contents.
6. Re-write history of all branches -- converted from svn and git-only -- see 
note below [*].
7. Publish this "pretty" repo for community to sanity-check its contents.
8. Update both "raw" and "pretty" repos daily with new commits
9. Fix problems in the "raw" and "pretty" repos as they reported by the 
community.

Once these steps are done, the community could switch from SVN to git by 
disabling commits to SVN, waiting for final history to be absorbed by the 
"pretty" repo, and deploying the git repo as the official repo.

[*] Note on branch re-writing:
During svn->git conversion we have an opportunity to correct some of the 
artifacts of current git mirror:

a. Author and committer entries.  These are difficult to get right during 
git-svn import process because the tool gives only SVN committer ID without 
much else.  We could do much better by matching SVN committer ID with person's 
name in the map file, and then searching for person's current-at-the-time email 
address in the commit diff.  I.e., mkuvyrkov -> Maxim Kuvyrkov -> [changelog 
from 2010's commit] -> ma...@codesourcery.com .

b. Re-write tags/ branches into annotated tags.  Note that tags/* are included 
into history of several branches via merge or copy commits, so we would need to 
re-write history to have proper references to annotated tag commits in the 
histories of such branches.

c. Since we are re-writing history anyway, it would be nice to convert 
"svn-git: svn+ssh://" tags to "svn-git: https://";.  We are sure to retain 
publicly-visible svn repo accessible via https://, but not as likely to retain 
svn+ssh:// interface.

Which of these will make into the final repo is for community to decide.

Regards,

--
Maxim Kuvyrkov
www.linaro.org



> On May 28, 2019, at 1:31 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote:
> 
> Hi Everyone,
> 
> What can I say, I was too optimistic about how easy it would be to convert 
> GCC's svn repo to git one branch at a time.  After 2 more weeks and several 
> re-writes of the scripts I now know more about GCC's svn history than I would 
> ever wanted.
> 
> The prize for most complicated branch history goes to /branches/ibm/* .  It 
> has merges, it has re-creation branches from /trunk and even an accidental 
> deletion of all of IBM's branches.
> 
> The version of scripts I'm testing right now seems to deal with all of that.
> 
> Also, to avoid controversy -- I'm working on these scripts to satisfy my own 
> curiosity, and to give GCC community another option to choose from for the 
> final migration.  If by end of Summer 2019 we have 2-3 git repos to choose 
> from, then we are likely to push GCC [kicking and screaming] into 2010's by 
> the end of this decade.
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 
>> On May 14, 2019, at 7:11 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> 
>> wrote:
>> 
>> This patch adds scripts to contrib/ to migrate full history of GCC's 
>> subversion repository to git.  My hope is that these scripts will finally 
>> allow GCC project to migrate to Git.
>> 
>> The result of the conversion is at 
>> https://github.com/maxim-kuvyrkov/gcc/branches/all .  Branches with "@rev" 
>> suffixes represent branch points.  The conversion is still running, so not 
>> all branches may appear right away.
>> 
>> The scripts are not specific to GCC repo and are usable for other projects.  
>> In particular, they should be able to convert downstream GCC svn repos.
>> 
>> The scripts convert svn history branch by branch.  They rely on git-svn on 
>> convert individual branches.  Git-svn is a good tool for converting 
>> individual branches.  It is, however, either very slow at converting the 
>> entire GCC repo, or goes into infinite loop.
>> 
>> There are 3 scripts:
>> 
>> - svn-git-repo.sh: top level script to convert entire repo or a part of it 
>> (e.g., branches/),
>> - svn-list-branches.sh: helper script to output branches and their parents 
>> in bottom-up order,
>> - svn-git-branch.sh: helper script to convert a single branch.
>> 
>> Whenever possible, svn-git-branch.sh uses existing git branches as caches.
>> 
>> What are your questions and comments?
>> 
>> The attached is cleaned up version, which hasn't been fully tested yet; 
>> typos and other silly mistakes are likely.  OK to commit after testing?
>> 
>> --
>> Maxim Kuvyrkov
>> www.linaro.org
>> 
>> 
>> <0001-Contrib-SVN-Git-conversion-scripts.patch>
> 

Reply via email to