[gentoo-dev] Re: RFC: split up media-sound/ category

Duncan Sun, 26 Jun 2011 07:11:25 -0700

Kent Fredric posted on Sun, 26 Jun 2011 17:43:27 +1200 as excerpted:

> On 26 June 2011 15:49, Wyatt Epp <wyatt....@gmail.com> wrote:
>> As for the latter part, the size of a git repo becoming umanageable
>> over time had not occurred to me, I'm afraid-- would it work to use
>> shallow clones?  Otherwise, the herd-wise division is probably
>> acceptable.  Need to think about that one more.
> 
> 
>   --depth <depth>
>            Create a shallow clone with a history truncated to the
>            specified number of revisions. A shallow repository has a
>            number of limitations (you cannot clone or fetch from it, nor
>            push from nor into it), but is adequate if you are only
>            interested in the recent history of a large project with a
>            long history, and would want to send in fixes as patches.
> 
> It would be ok perhaps for non-contributing users to use shallow clones,
> but in my understanding, shallow clones limit you to doing what you
> could do with a tar file of the specified revision, which basically
> makes it impractical for people who are developing on it,
> and would mean every new developer would get a progressively longer time
> in order to do a complete check out.


Not substantially so, no.

FWIW, git scales VERY well in this regard, provided it's used for text-
based content (sources) as originally intended.  (It's not so hot at 
binary blob management, but it's not designed for that.  Fortunately, 
gentoo's usage would be nearly 100% text-based.)

What git does over time is compress the diffs into a series of packages 
(tarballs or whatever, I don't know the internals), and text compresses 
REALLY well.  Then new checkouts grab the compressed packages, with only 
the last little bit being uncompressed.  Existing users can run garbage-
collection periodically to collect and compress their existing history 
into the packages as well.

So for example, du says my kernel git tree totals 1.6 GB, including the 
active checkout and two separate (dirty) build trees.  The bare git tree 
(history repo without working tree) itself is 891 MB.  So the bare repo 
is only 54% of the total, and I've not actually garbage-collected in some 
time.  If I had, the ratio would be closer to 50%, meaning the entire 
kernel git history repo compresses to roughly the size of the working 
tree, and only roughly doubles the size of a single decompressed working 
tarball.

Over time that'll certainly grow a bit, but it really does scale well.  
The kernel has been in git for enough time now that there's quite some 
history built up, and that it only roughly doubles the size of a single 
decompressed working tree snapshot, while making available at my 
fingertips the entire history since original checkin, is impressive 
indeed.

It's all down to how well the sources and diffs compress.  If there were 
significant binary blobs in there (the kernel tree does have a few bits 
of firmware, the tux logo, etc), it would compress far less effectively.  
But gentoo's tree is pretty much all text as well, fortunately. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

[gentoo-dev] Re: RFC: split up media-sound/ category

Reply via email to