[ 
https://issues.apache.org/jira/browse/LUCENE-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15074825#comment-15074825
 ] 

Dawid Weiss commented on LUCENE-6933:
-------------------------------------

Thanks. I've placed the scripts and know-how on how the migration process is 
performed here:
https://github.com/dweiss/lucene-solr-svn2git-migration

The current git version of SVN at Apache is broken and cannot be reused, author 
tags are messed up:
{code}
> git remote -v
origin  git://git.apache.org/lucene-solr.git (fetch)
origin  git://git.apache.org/lucene-solr.git (push)
> git log --all | grep "Author: " | sort -u
...
Author: Adrien Grand <[email protected] =  jpountz = Adrien Grand 
[email protected]@apache.org>
Author: Adrien Grand <[email protected]>
...
Author: dsmiley <dsmiley@13f79535-47bb-0310-9956-ffa450edef68>
Author: ehatcher <ehatcher@13f79535-47bb-0310-9956-ffa450edef68>
... (and more)
{code}

I fetched everything from scratch via git-svn (see the scripts if you're 
interested). I also introduced a few minor synthetic commits that reshuffle 
folders or do some cleanups so that the repository looks more sensible. An 
overview of what it looks like conceptually (with revision numbers and sources) 
is here:

https://raw.githubusercontent.com/dweiss/lucene-solr-svn2git-migration/master/docs/dev-lines-overview.png

As mentioned previously, I also cleaned up tags and branches (moving all 
current branches to tags under {{history/*}}. These (and graft tags) can be 
deleted of course - I left them as a reference. All releases use 
{{release/(project)/(version)}} convention, again converted to more modern, 
dot-separated naming scheme (SVN tags used underscores back from CVS days).

> Create a (cleaned up) SVN history in git
> ----------------------------------------
>
>                 Key: LUCENE-6933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6933
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>         Attachments: migration.txt, multibranch-commits.log, tools.zip
>
>
> Goals:
> * selectively drop projects and core-irrelevant stuff:
>   ** {{lucene/site}}
>   ** {{lucene/nutch}}
>   ** {{lucene/lucy}}
>   ** {{lucene/tika}}
>   ** {{lucene/hadoop}}
>   ** {{lucene/mahout}}
>   ** {{lucene/pylucene}}
>   ** {{lucene/lucene.net}}
>   ** {{lucene/old_versioned_docs}}
>   ** {{lucene/openrelevance}}
>   ** {{lucene/board-reports}}
>   ** {{lucene/java/site}}
>   ** {{lucene/java/nightly}}
>   ** {{lucene/dev/nightly}}
>   ** {{lucene/dev/lucene2878}}
>   ** {{lucene/sandbox/luke}}
>   ** {{lucene/solr/nightly}}
> * preserve the history of all changes to core sources (Solr and Lucene).
>   ** {{lucene/java}}
>   ** {{lucene/solr}}
>   ** {{lucene/dev/trunk}}
>   ** {{lucene/dev/branches/branch_3x}}
>   ** {{lucene/dev/branches/branch_4x}}
>   ** {{lucene/dev/branches/branch_5x}}
> * provide a way to link git commits and history with svn revisions (amend the 
> log message).
> * annotate release tags
> * deal with large binary blobs (JARs): keep empty files instead for their 
> historical reference only.
> Non goals:
> * no need to preserve "exact" merge history from SVN (see "impossible" below).
> * Ability to build ancient versions is not an issue.
> Impossible:
> * It is not possible to preserve SVN "merge history" because of the following 
> reasons:
>   ** Each commit in SVN operates on individual files. So one commit can 
> "copy" (and record a merge) files from anywhere in the object tree, even 
> modifying them along the way. There simply is no equivalent for this in git. 
>   ** There are historical commits in SVN that apply changes to multiple 
> branches in one commit ({{r1569975}}) and merges *from* multiple branches in 
> one commit ({{r940806}}).
> * Because exact merge tracking is impossible then what follows is that exact 
> "linearized" history of a given file is also impossible to record. Let's say 
> changes X, Y and Z have been applied to a branch of a file A and then merged 
> back. In git, this would be reflected as a single commit flattening X, Y and 
> Z (on the target branch) and three independent commits on the branch. The 
> "copy-from" link from one branch to another cannot be represented because, as 
> mentioned, merges are done on entire branches in git, not on individual 
> files. Yes, there are commits in SVN history that have selective file merges 
> (not entire branches).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to