On Fri, Feb 26, 2016 at 7:59 AM, Martin Vaeth <mar...@mvath.de> wrote: > Rich Freeman <ri...@gentoo.org> wrote: >>> >>> And currently the git history is still almost empty... >>> >> >> If you want pre-migration history you need to fetch that separately. > > How? Neither on gitweb.gentoo.org nor on github I found an obvious > repository with this data.
https://wiki.gentoo.org/wiki/Gentoo_git_workflow#Grafting_Gentoo_History_Onto_the_Active_Repo If you're interested in history it is easy to do, and the repo on github works fine for web access or the various github stats/etc. Well, sort-of - I get the impression that github doesn't host a lot of repos with that much history and when you push that repo to github for the first time it will timeout and die and the repo will appear on the site 30-60min later (I imagine subsequent pushes would be fine). I think we actually have one of the largest git repos out there in terms of number of objects. At least, when I was keeping tabs on other migration efforts there weren't many that came close (including some projects that you'd think of as having a lot of history). The fact that every package revision+patch+etc is a file in Gentoo is a big part of that. > >> It is about 1.7G. >> Considering that this represents a LOT more than 2-3 years of history > > If the 1.7G are fully compressed history, this would confirm > my estimate rather precisely, if it represents (1700/120 - 1) ~ 13 years. Perhaps I misread your post then. I saw lots of numbers but not many units, and I probably didn't follow what you intended to say. > > Note that I compared squashfs with a git user who does not even > care about git-internal recompression. Of course, you can decrease > the factor somewhat if e.g. your checked-out tree is still stored > on squashfs. This does not change the fact that the factor will > increase every year by about 1 (or probably more, because git > uses the uneffective gzip compression, only). > A checkout of gentoo-x86 is about 590M. If you use the repo that includes cache/etc it expands to 1.2G. 13 years of history is 1.7G. Clearly it doesn't increase by a factor of 1 every year, unless again I'm misunderstanding what you're intending to communicate. A git checkout consists of two parts. It has the .git directory which contains all the data, and it consists of the working tree. In the case of gentoo-x86 the working tree is about 440MB and the history is about 150M. The working tree doesn't really change in size much - it just reflects the size of the current revision of the tree. It is also not compressed (unless you stick the whole thing in a squashfs, which you could certainly do). It is the history which continuously grows. However, the history IS compressed and the reality is that most new ebuilds are similar to ebuilds that are already in the history, so it compresses very well. Of course it would be nice if you could use something other than gzip to compress it. There is no reason that somebody couldn't distribute squashfs versions of a git /usr/portage, but if you want the full history it would still be around 1.7G. It would still be smaller than a checked-out tree (the 1.7G figure is just history - it doesn't include the extra 440MB or so for the checkout). My point wasn't so much that there aren't sized benefits to squashfs and no history. I'm just saying that git is pretty efficient for what it does do. -- Rich