Re: Something is broken in repack

2007-12-14 Thread Wolfram Gloger
Hi, > Uh what? Someone crank out his copy of "The Art of Computer > Programming", I think volume 1. Best fit is known (analyzed and proven > and documented decades ago) to be one of the worst strategies for memory > allocation. Exactly because it leads to huge fragmentation problems. Well, quo

Re: Something is broken in repack

2007-12-14 Thread David Kastrup
Wolfram Gloger <[EMAIL PROTECTED]> writes: > Hi, > >> Note that delta following involves patterns something like >> >>allocate (small) space for delta >>for i in (1..depth) { >> allocate large space for base >> allocate large space for result >> .. apply delta .. >> fr

Re: Something is broken in repack

2007-12-14 Thread Wolfram Gloger
Hi, > Maybe an malloc/free/mmap wrapper that records the requested sizes and > alloc/free order and dumps them to file so that one can make a compact > git-free standalone test case for the glibc maintainers might be a good > thing. I already have such a wrapper: http://malloc.de/malloc/mtrace-2

Re: Something is broken in repack

2007-12-14 Thread Wolfram Gloger
Hi, > Note that delta following involves patterns something like > >allocate (small) space for delta >for i in (1..depth) { > allocate large space for base > allocate large space for result > .. apply delta .. > free large space for base > free small space fo

Re: Something is broken in repack

2007-12-14 Thread Wolfram Gloger
Hi, > >>if (progress->total) { > >>unsigned percent = n * 100 / progress->total; > >>if (percent != progress->last_percent || progress_update) { > >> + struct mallinfo m = mallinfo(); > >>progress->last_percent = percent; > >> -

Re: Something is broken in repack

2007-12-14 Thread Nicolas Pitre
On Fri, 14 Dec 2007, Paolo Bonzini wrote: > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > all things. > > I found that the .keep file is not transmitted over the network (at least I > tried with git+ssh:// and http:// protocols), however. That is a local policy.

Re: Something is broken in repack

2007-12-14 Thread Nguyen Thai Ngoc Duy
On Dec 14, 2007 4:01 PM, Harvey Harrison <[EMAIL PROTECTED]> wrote: > While it doesn't mark the packs as .keep, git will reuse all of the old > deltas you got in the original clone, so you're not losing anything. There is another reason I want it. I have an ~800MB pack and I don't want git to rewr

Re: Something is broken in repack

2007-12-14 Thread Jakub Narebski
"Nguyen Thai Ngoc Duy" <[EMAIL PROTECTED]> writes: > On Dec 14, 2007 1:14 PM, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > > all things. > > > > I found that the .keep file is not transmitted over the network (at > > leas

Re: Something is broken in repack

2007-12-14 Thread Harvey Harrison
On Fri, 2007-12-14 at 09:20 +0100, Paolo Bonzini wrote: > > I'm thinking about "git clone --keep" to mark initial packs precious. > > But 'git clone' is under rewrite to C. Let's wait until C rewrite is > > done. > > It should be the default, IMHO. > While it doesn't mark the packs as .keep, git

Re: Something is broken in repack

2007-12-14 Thread Paolo Bonzini
I'm thinking about "git clone --keep" to mark initial packs precious. But 'git clone' is under rewrite to C. Let's wait until C rewrite is done. It should be the default, IMHO. Paolo

Re: Something is broken in repack

2007-12-13 Thread Nguyen Thai Ngoc Duy
On Dec 14, 2007 1:14 PM, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of > > all things. > > I found that the .keep file is not transmitted over the network (at > least I tried with git+ssh:// and http:// protocols), however. I'm

Re: Something is broken in repack

2007-12-13 Thread Jakub Narebski
Johannes Sixt wrote: > Paolo Bonzini schrieb: >> Nguyen Thai Ngoc Duy wrote: >>> >>> Is there an alternative to "git repack -a -d" that repacks everything >>> but the first pack? >> >> That would be a pretty good idea for big repositories. If I were to >> implement it, I would actually add a .git

Re: Something is broken in repack

2007-12-13 Thread Johannes Sixt
Paolo Bonzini schrieb: > Nguyen Thai Ngoc Duy wrote: >> On Dec 12, 2007 10:48 PM, Nicolas Pitre <[EMAIL PROTECTED]> wrote: >>> In the mean time you might have to use only one thread and lots of >>> memory to repack the gcc repo, or find the perfect memory allocator to >>> be used with Git. After a

Re: Something is broken in repack

2007-12-13 Thread Paolo Bonzini
Is there an alternative to "git repack -a -d" that repacks everything but the first pack? That would be a pretty good idea for big repositories. If I were to implement it, I would actually add a .git/config option like pack.permanent so that more than one pack could be made permanent; then

Re: Something is broken in repack

2007-12-13 Thread Nguyen Thai Ngoc Duy
On Dec 12, 2007 10:48 PM, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > In the mean time you might have to use only one thread and lots of > memory to repack the gcc repo, or find the perfect memory allocator to > be used with Git. After all, packing the whole gcc history to around > 230MB is quite a

Re: Something is broken in repack

2007-12-12 Thread Andreas Ericsson
Nicolas Pitre wrote: On Wed, 12 Dec 2007, Nicolas Pitre wrote: I did modify the progress display to show accounted memory that was allocated vs memory that was freed but still not released to the system. At least that gives you an idea of memory allocation and fragmentation with glibc in rea

Re: Something is broken in repack. Why not with fork and pipes?

2007-12-12 Thread Johannes Schindelin
Hi, On Wed, 12 Dec 2007, J.C. Pizarro wrote: > It's good idea if it's for 24/365.25 that it does > autorepack-compute-again-again-again-those-unexplored-deltas of > git repositories in realtime. :D This sentence does not parse. > Some body can do "git clone" that it could give smaller that on

Re: Something is broken in repack. Why not with fork and pipes?

2007-12-12 Thread J.C. Pizarro
At http://gcc.gnu.org/ml/gcc/2007-12/msg00360.html, Andreas Ericsson <[EMAIL PROTECTED]> wrote: > If it's still an issue next week, we'll have a 16 core (8 dual-core cpu's) > machine with some 32gb of ram in that'll be free for about two days. > You'll have to remind me about it though, as I've got

Re: Something is broken in repack

2007-12-12 Thread Jon Smirl
On 12/12/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Wed, 12 Dec 2007, Nicolas Pitre wrote: > > > > So... my conclusion is that the glibc allocator has fragmentation issues > > with this work load, given the notable difference with the Google > > allocator, which itself might not be comp

Re: Something is broken in repack

2007-12-12 Thread Linus Torvalds
On Wed, 12 Dec 2007, David Miller wrote: > > I personally don't think it's unreasonable for GIT to have it's > own customized allocator at least for certain object types. Well, we actually already *do* have a customized allocator, but currently only for the actual core "object descriptor" that

Re: Something is broken in repack

2007-12-12 Thread David Miller
From: Linus Torvalds <[EMAIL PROTECTED]> Date: Wed, 12 Dec 2007 08:37:10 -0800 (PST) > I'm not saying that particular case happens in git, I'm just saying that > it's not unheard of. And with the delta cache and the object lookup, it's > not at _all_ impossible that we hit the "allocate in one t

Re: Something is broken in repack

2007-12-12 Thread Linus Torvalds
On Wed, 12 Dec 2007, Nicolas Pitre wrote: > > So... my conclusion is that the glibc allocator has fragmentation issues > with this work load, given the notable difference with the Google > allocator, which itself might not be completely immune to fragmentation > issues of its own. Yes. Not

Re: Something is broken in repack

2007-12-12 Thread Paolo Bonzini
When I returned to the computer this morning, the repack was completed... with a 1.3GB pack instead. So... The gcc repo apparently really needs a large window to efficiently compress those large objects. So, am I right that if you have a very well-done pack (such as gcc's), you might want

Re: Something is broken in repack

2007-12-12 Thread Nicolas Pitre
On Wed, 12 Dec 2007, Nicolas Pitre wrote: > I did modify the progress display to show accounted memory that was > allocated vs memory that was freed but still not released to the system. > At least that gives you an idea of memory allocation and fragmentation > with glibc in real time: > > di

Re: Something is broken in repack

2007-12-12 Thread Nicolas Pitre
On Wed, 12 Dec 2007, Nicolas Pitre wrote: > Add memory fragmentation to that and you have a clogged system. > > Solution: > > pack.deltacachesize=1 > pack.windowmemory=16M > > Limiting the window memory to 16MB will automatically shrink the window > size when big objects are encou

Re: Something is broken in repack

2007-12-12 Thread David Kastrup
Nicolas Pitre <[EMAIL PROTECTED]> writes: > Well... This is weird. > > It seems that memory fragmentation is really really killing us here. > The fact that the Google allocator did manage to waste quite less memory > is a good indicator already. Maybe an malloc/free/mmap wrapper that records t

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote: > On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > > > OK, here's something else for you to try: > > > > > > core.deltabasecachelimit=0 > > > pack.threads=2 > > > pack.deltacachesize=1

Re: Something is broken in repack

2007-12-11 Thread Andreas Ericsson
Junio C Hamano wrote: Linus Torvalds <[EMAIL PROTECTED]> writes: So what you actually want to do is to just re-use already packed delta chains directly, which is what we normally do. But you are explicitly looking at the "--no-reuse-delta" (aka "git repack -f") case, which is why it then blow

Re: Something is broken in repack

2007-12-11 Thread Andreas Ericsson
Nicolas Pitre wrote: On Tue, 11 Dec 2007, David Miller wrote: From: Nicolas Pitre <[EMAIL PROTECTED]> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) BUT. The point is that repacking the gcc repo using "git repack -a -f --window=250" has a radically different memory usage profile whether you do

Re: Something is broken in repack

2007-12-11 Thread Junio C Hamano
Linus Torvalds <[EMAIL PROTECTED]> writes: > On Tue, 11 Dec 2007, Jon Smirl wrote: >> > >> > So if you want to use more threads, that _forces_ you to have a bigger >> > memory footprint, simply because you have more "live" objects that you >> > work on. Normally, that isn't much of a problem, sinc

Re: Something is broken in repack

2007-12-11 Thread Linus Torvalds
On Tue, 11 Dec 2007, Jon Smirl wrote: > > > > So if you want to use more threads, that _forces_ you to have a bigger > > memory footprint, simply because you have more "live" objects that you > > work on. Normally, that isn't much of a problem, since most source files > > are small, but if you ha

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote: > This makes sense. Those runs that blew up to 4.5GB were a combination > of this effect and fragmentation in the gcc allocator. I disagree. This is insane. > Google allocator appears to be much better at controlling fragmentation. Indeed. And if fragment

Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > On Tue, 11 Dec 2007, Jon Smirl wrote: > > > > So why does our threaded code take 20 CPU minutes longer (12%) to run > > than the same code with a single thread? > > Threaded code *always* takes more CPU time. The only thing you can hope >

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, David Miller wrote: > From: Nicolas Pitre <[EMAIL PROTECTED]> > Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) > > > BUT. The point is that repacking the gcc repo using "git repack -a -f > > --window=250" has a radically different memory usage profile whether you > > do the r

Re: Something is broken in repack

2007-12-11 Thread Daniel Berlin
On 12/11/07, Jon Smirl <[EMAIL PROTECTED]> wrote: > > Total CPU time 196 CPU minutes vs 190 for gcc. Google's claims of > being faster are not true. Depends on your allocation patterns. For our apps, it certainly is :) Of course, i don't know if we've updated the external allocator in a while, i'l

Re: Something is broken in repack

2007-12-11 Thread David Miller
From: Nicolas Pitre <[EMAIL PROTECTED]> Date: Tue, 11 Dec 2007 12:21:11 -0500 (EST) > BUT. The point is that repacking the gcc repo using "git repack -a -f > --window=250" has a radically different memory usage profile whether you > do the repack on the earlier 2.1GB pack or the later 300MB pac

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Linus Torvalds wrote: > That said, I suspect there are a few things fighting you: > > - threading is hard. I haven't looked a lot at the changes Nico did to do >a threaded object packer, but what I've seen does not convince me it is >correct. The "trg_entry" access

Re: Something is broken in repack

2007-12-11 Thread Linus Torvalds
On Tue, 11 Dec 2007, Jon Smirl wrote: > > So why does our threaded code take 20 CPU minutes longer (12%) to run > than the same code with a single thread? Threaded code *always* takes more CPU time. The only thing you can hope for is a wall-clock reduction. You're seeing probably a combination

Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > OK, here's something else for you to try: > > > > core.deltabasecachelimit=0 > > pack.threads=2 > > pack.deltacachesize=1 > > > > With that I'm able to repack the small gcc pack

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote: > OK, here's something else for you to try: > > core.deltabasecachelimit=0 > pack.threads=2 > pack.deltacachesize=1 > > With that I'm able to repack the small gcc pack on my machine with 1GB > of ram using: > > git repack -a -f

Re: Something is broken in repack

2007-12-11 Thread Jon Smirl
On 12/11/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > On Tue, 11 Dec 2007, Nicolas Pitre wrote: > > > And yet, this is still missing the actual issue. The issue being that > > the 2.1GB pack as a _source_ doesn't cause as much memory to be > > allocated even if the _result_ pack ends up being th

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Nicolas Pitre wrote: > And yet, this is still missing the actual issue. The issue being that > the 2.1GB pack as a _source_ doesn't cause as much memory to be > allocated even if the _result_ pack ends up being the same. > > I was able to repack the 2.1GB pack on my machin

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote: > Switching to the Google perftools malloc > http://goog-perftools.sourceforge.net/ > > 10% 30 828M > 20% 15 831M > 30% 10 834M > 40% 50 1014M > 50% 80 1086M > 60% 80 1500M > 70% 200 1.53G > 80% 200 1.85G > 90% 260 1.87G > 95% 520 1.97G

Re: Something is broken in repack

2007-12-11 Thread Nicolas Pitre
On Tue, 11 Dec 2007, Jon Smirl wrote: > I added the gcc people to the CC, it's their repository. Maybe they > can help up sort this out. Unless there is a Git expert amongst the gcc crowd, I somehow doubt it. And gcc people with an interest in Git internals are probably already on the Git maili

Re: Something is broken in repack

2007-12-10 Thread Andreas Ericsson
Jon Smirl wrote: Switching to the Google perftools malloc http://goog-perftools.sourceforge.net/ Google allocator knocked 600MB off from memory use. Memory consumption did not fall during the write out phase like it did with gcc. Since all of this is with the same code except for changing the t

Re: Something is broken in repack

2007-12-10 Thread Jon Smirl
Switching to the Google perftools malloc http://goog-perftools.sourceforge.net/ 10% 30 828M 20% 15 831M 30% 10 834M 40% 50 1014M 50% 80 1086M 60% 80 1500M 70% 200 1.53G 80% 200 1.85G 90% 260 1.87G 95% 520 1.97G 100% 1335 2.24G Google allocator knocked 600MB off from memory u

Re: Something is broken in repack

2007-12-10 Thread Jon Smirl
I added the gcc people to the CC, it's their repository. Maybe they can help up sort this out. On 12/11/07, Jon Smirl <[EMAIL PROTECTED]> wrote: > On 12/10/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote: > > On Mon, 10 Dec 2007, Jon Smirl wrote: > > > > > New run using same configuration. With the ad