3 is typically solved by adding a .gitignore or .gitkeep file in what would
be an empty directory, if the directory itself is important.


On Tue, Dec 15, 2015 at 12:21 PM, Dawid Weiss <[email protected]> wrote:

>
> Oh, just for completeness -- moving to git is not just about the version
> management, it's also:
>
> 1) all the scripts that currently do validations, etc.
> 2) what to do with svn:* properties
> 3) what to do with empty folders (not available in git).
>
> I don't volunteer to solve these :)
>
> Dawid
>
>
> On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <[email protected]>
> wrote:
>
>>
>> Ok, give me some time and I'll see what I can achieve. Now that I
>> actually wrote an SVN dump parser (validator and serializer) things are
>> under much better control...
>>
>> I'll try to achieve the following:
>>
>> 1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs
>> and perhaps other binaries),
>> 2) *preserve* history of all core sources. So svn log IndexWriter has to
>> go back all the way back to when Doug was young and pretty. Ooops, he's
>> still pretty of course.
>> 3) provide a way to link git history with svn revisions. I would,
>> ideally, include a "imported from svn:rev XXX" in the commit log message.
>> 4) annotate release tags and branches. I don't care much about interim
>> branches -- they are not important to me (please speak up if you think
>> otherwise).
>>
>> Dawid
>>
>> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <[email protected]> wrote:
>>
>>> If Dawid is volunteering to sort out this mess, +1 to let him make it
>>> a move to git. I don't care if we disagree about JARs, I trust he will
>>> do a good job and that is more important.
>>>
>>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <[email protected]>
>>> wrote:
>>> >
>>> > It's not true that nobody is working on this. I have been working on
>>> the SVN
>>> > dump in the meantime. You would not believe how incredibly complex the
>>> > process of processing that (remote) dump is. Let me highlight a few key
>>> > issues:
>>> >
>>> > 1) There is no "one" Lucene SVN repository that can be transferred to
>>> git.
>>> > The history is a mess. Trunk, branches, tags -- all change paths at
>>> various
>>> > points in history. Entire projects are copied from *outside* the
>>> official
>>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for
>>> > example).
>>> >
>>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k
>>> commits.
>>> > ASF's commit history in which those 50k commits live is 1.8 *million*
>>> > commits. I think the git-svn sync crashes due to the sheer number of
>>> (empty)
>>> > commits in between actual changes.
>>> >
>>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G
>>> > patch, for example, but there are others (the second larger is
>>> 190megs, the
>>> > third is 136 megs).
>>> >
>>> > 4) The size of JARs is really not an issue. The entire SVN repo I
>>> mirrored
>>> > locally (including empty interim commits to cater for svn:mergeinfos)
>>> is 4G.
>>> > If you strip the stuff like javadocs and side projects (Nutch, Tika,
>>> Mahout)
>>> > then I bet the entire history can fit in 1G total. Of course stripping
>>> JARs
>>> > is also doable.
>>> >
>>> > 5) There is lots of junk at the main SVN path so you can't just
>>> version the
>>> > top-level folder. If you wanted to checkout /asf/lucene then the size
>>> of the
>>> > resulting folder is enormous -- I terminated the checkout after I
>>> reached
>>> > over 20 gigs. Well, technically you *could* do it, it'd preserve
>>> perfect
>>> > history, but I wouldn't want to git co a past version that checks out
>>> all
>>> > the tags, branches, etc. This has to be mapped in a sensible way.
>>> >
>>> > What I think is that all the above makes (straightforward) conversion
>>> to git
>>> > problematic. Especially moving paths are a problem -- how to mark tags/
>>> > branches, where the main line of development is, etc. This conversion
>>> would
>>> > have to be guided and hand-tuned to make sense. This effort would only
>>> pay
>>> > for itself if we move to git, otherwise I don't see the benefit. Paul's
>>> > script is fine for keeping short-term history.
>>> >
>>> > Dawid
>>> >
>>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken,
>>> which
>>> > makes processing SVN history even more fun. This dump indicates Tika
>>> being
>>> > moved from the incubator to Lucene:
>>> >
>>> > svnrdump dump -r 712381 --incremental
>>> https://svn.apache.org/repos/asf/ >
>>> > out
>>> >
>>> > But when you dump just Lucene's subpath, the output is broken (last
>>> > changeset in the file is an invalid changeset, it carries no target):
>>> >
>>> > svnrdump dump -r 712381 --incremental
>>> > https://svn.apache.org/repos/asf/lucene > out
>>> >
>>> >
>>> >
>>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <[email protected]>
>>> wrote:
>>> >>
>>> >> If we move to git, stripping out jars seems to be an independent
>>> decision?
>>> >> Can you even strip out jars and preserve history (i.e. not change
>>> >> hashes and invalidate everyone's forks/clones)?
>>> >> I did run across this:
>>> >>
>>> >>
>>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history
>>> >>
>>> >> -Yonik
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: [email protected]
>>> >> For additional commands, e-mail: [email protected]
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>
>>
>

Reply via email to