3 is typically solved by adding a .gitignore or .gitkeep file in what would be an empty directory, if the directory itself is important.
On Tue, Dec 15, 2015 at 12:21 PM, Dawid Weiss <[email protected]> wrote: > > Oh, just for completeness -- moving to git is not just about the version > management, it's also: > > 1) all the scripts that currently do validations, etc. > 2) what to do with svn:* properties > 3) what to do with empty folders (not available in git). > > I don't volunteer to solve these :) > > Dawid > > > On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <[email protected]> > wrote: > >> >> Ok, give me some time and I'll see what I can achieve. Now that I >> actually wrote an SVN dump parser (validator and serializer) things are >> under much better control... >> >> I'll try to achieve the following: >> >> 1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs >> and perhaps other binaries), >> 2) *preserve* history of all core sources. So svn log IndexWriter has to >> go back all the way back to when Doug was young and pretty. Ooops, he's >> still pretty of course. >> 3) provide a way to link git history with svn revisions. I would, >> ideally, include a "imported from svn:rev XXX" in the commit log message. >> 4) annotate release tags and branches. I don't care much about interim >> branches -- they are not important to me (please speak up if you think >> otherwise). >> >> Dawid >> >> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <[email protected]> wrote: >> >>> If Dawid is volunteering to sort out this mess, +1 to let him make it >>> a move to git. I don't care if we disagree about JARs, I trust he will >>> do a good job and that is more important. >>> >>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <[email protected]> >>> wrote: >>> > >>> > It's not true that nobody is working on this. I have been working on >>> the SVN >>> > dump in the meantime. You would not believe how incredibly complex the >>> > process of processing that (remote) dump is. Let me highlight a few key >>> > issues: >>> > >>> > 1) There is no "one" Lucene SVN repository that can be transferred to >>> git. >>> > The history is a mess. Trunk, branches, tags -- all change paths at >>> various >>> > points in history. Entire projects are copied from *outside* the >>> official >>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for >>> > example). >>> > >>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k >>> commits. >>> > ASF's commit history in which those 50k commits live is 1.8 *million* >>> > commits. I think the git-svn sync crashes due to the sheer number of >>> (empty) >>> > commits in between actual changes. >>> > >>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G >>> > patch, for example, but there are others (the second larger is >>> 190megs, the >>> > third is 136 megs). >>> > >>> > 4) The size of JARs is really not an issue. The entire SVN repo I >>> mirrored >>> > locally (including empty interim commits to cater for svn:mergeinfos) >>> is 4G. >>> > If you strip the stuff like javadocs and side projects (Nutch, Tika, >>> Mahout) >>> > then I bet the entire history can fit in 1G total. Of course stripping >>> JARs >>> > is also doable. >>> > >>> > 5) There is lots of junk at the main SVN path so you can't just >>> version the >>> > top-level folder. If you wanted to checkout /asf/lucene then the size >>> of the >>> > resulting folder is enormous -- I terminated the checkout after I >>> reached >>> > over 20 gigs. Well, technically you *could* do it, it'd preserve >>> perfect >>> > history, but I wouldn't want to git co a past version that checks out >>> all >>> > the tags, branches, etc. This has to be mapped in a sensible way. >>> > >>> > What I think is that all the above makes (straightforward) conversion >>> to git >>> > problematic. Especially moving paths are a problem -- how to mark tags/ >>> > branches, where the main line of development is, etc. This conversion >>> would >>> > have to be guided and hand-tuned to make sense. This effort would only >>> pay >>> > for itself if we move to git, otherwise I don't see the benefit. Paul's >>> > script is fine for keeping short-term history. >>> > >>> > Dawid >>> > >>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken, >>> which >>> > makes processing SVN history even more fun. This dump indicates Tika >>> being >>> > moved from the incubator to Lucene: >>> > >>> > svnrdump dump -r 712381 --incremental >>> https://svn.apache.org/repos/asf/ > >>> > out >>> > >>> > But when you dump just Lucene's subpath, the output is broken (last >>> > changeset in the file is an invalid changeset, it carries no target): >>> > >>> > svnrdump dump -r 712381 --incremental >>> > https://svn.apache.org/repos/asf/lucene > out >>> > >>> > >>> > >>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <[email protected]> >>> wrote: >>> >> >>> >> If we move to git, stripping out jars seems to be an independent >>> decision? >>> >> Can you even strip out jars and preserve history (i.e. not change >>> >> hashes and invalidate everyone's forks/clones)? >>> >> I did run across this: >>> >> >>> >> >>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history >>> >> >>> >> -Yonik >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> >> >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> >>> >> >
