Re: Git and GCC

2007-12-06 Thread David Brown

On Wed, Dec 05, 2007 at 11:49:21PM -0800, Harvey Harrison wrote:



git repack -a -d --depth=250 --window=250



Since I have the whole gcc repo locally I'll give this a shot overnight
just to see what can be done at the extreme end or things.


When I tried this on a very large repo, at least one with some large files
in it, git quickly exceeded my physical memory and started thrashing the
machine.  I had good results with

 git config pack.deltaCacheSize 512m
 git config pack.windowMemory 512m

of course adjusting based on your physical memory.  I think changing the
windowMemory will affect the resulting compression, so changing these
ratios might get better compression out of the result.

If you're really patient, though, you could leave the unbounded window,
hope you have enough swap, and just let it run.

Dave


Generate Codes for a something like stack/dataflow computer

2007-12-06 Thread Li Wang
Hi,
We are retargetting GCC to a VLIW chip, which runs as a coprocessor to a
general purpose processor. The coprocessor is responsible for
expediating some code sections which have good parallel characteristics
without any dependences. Its ISA enables it can only fetch data
sequentially rather than random access from a on-chip memory which is
shared by the host processor, through dedicated function units named
DBx. The host processor is responsible to place data there, and told the
DBx base address and data length. Once the data is fetched by the
coprocessor, it is stored to local registers owned by the coprocessor,
and before the computing ends, the data will always reside in the
coprocessor's registers. Namely, without spills and it permits no
spills. From the coprocessor standpoint, the instructions supports no
memory operands and no any addressing mode. It supports only register
move and arithmetical operations. It looks something like data flow
computer or stack computer. Let's take the following codes as an example:

int main()
{
int a[16], b[16], c[16];

compute(a, b, c);
return 0;
}
void compute(int a[], int b[], int c[])
{
for (int j = 0; j < 16; j++)
c[j] = a[j] + b[j];
return;
}

We want to put the function compute() executed on the coprocessor, and
host processor organizes and places the data at proper positions in the
on-chip memory, prepare the DBx function units. Assume DB0 is allocated
to array a[], DB1 to b[], DB2 to c[]. Then the assemble codes for the
coprocessor we want to generate like as follows,

L3:
if (data in DB0 not exausted)
goto L1;
else
goto L2;
L1:
get R0, DB0; // load a data from the on-chip memory through DB0 to R0
get R1, DB1;
add R2, R0, R1;
put R2, DB2; // store result to DB2
goto L3;
L2:
end;

Could anyone give some hints how to implement that, currently the GCC
internals for addressing mode in the machine description could support that?

Li


Re: Git and GCC

2007-12-06 Thread Andreas Schwab
Harvey Harrison <[EMAIL PROTECTED]> writes:

> git svn does accept a mailmap at import time with the same format as the
> cvs importer I think.  But for someone that just wants a repo to check
> out this was easiest.  I'd be willing to spend the time to do a nicer
> job if there was any interest from the gcc side, but I'm not that
> invested (other than owing them for an often-used tool).

I have a complete list of the uid<->mail mapping for the gcc repository.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Rant about ChangeLog entries and commit messages

2007-12-06 Thread Andreas Schwab
Ben Elliston <[EMAIL PROTECTED]> writes:

> On Wed, 2007-12-05 at 18:35 -0500, Daniel Berlin wrote:
>
>> svn propedit --revision  svn:log
>
> OK, well, it used to be a bit trickier in CVS .. :-)

In CVS it's just a cvs admin -m as well.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: Patch manager dying for a week or two

2007-12-06 Thread Tobias Burnus
Daniel Berlin wrote:
> Patch manager will be dying for a week or two while i change hosting.
> of course, if nobody is still using it, i can just kill it permanently.


At least I use it almost always to make sure patches does not get
forgotten; thus I regularly check http://dberlin.org/patches/patches/list

Additionally, I like that it automatically adds a link to the mailing
list in the PR; that way one can easily check the discussion in the
mailing list. (I also like PRs, they not only help to obtain more
information about a patch [cf. recently ChangeLog discussion], but also
ensure that one does not forget something.)

I think many gfortraners use :ADDPATCH:

Tobias


How to define a blackbox data type in gcc?

2007-12-06 Thread Bingfeng Mei
Hello,
I am wondering how to define a blackbox data type in gcc.  It can be too
wide and irregular to be represented by current data types. It needs to
be assigned to special register files.  I don't care and don't want to
touch its content except using intrinsics (builtin) functions.  An
example of such data type is to represent value in a MAC register.  Is
there are convenient way to do that? Thanks in advance,
 
Cheers,
Bingfeng Mei
 
Broadcom UK



Re: Git and GCC

2007-12-06 Thread Johannes Schindelin
Hi,

On Wed, 5 Dec 2007, David Miller wrote:

> From: "Daniel Berlin" <[EMAIL PROTECTED]>
> Date: Wed, 5 Dec 2007 21:41:19 -0500
> 
> > It is true I gave up quickly, but this is mainly because i don't like 
> > to fight with my tools.
> >
> > I am quite fine with a distributed workflow, I now use 8 or so gcc 
> > branches in mercurial (auto synced from svn) and merge a lot between 
> > them. I wanted to see if git would sanely let me manage the commits 
> > back to svn.  After fighting with it, i gave up and just wrote a 
> > python extension to hg that lets me commit non-svn changesets back to 
> > svn directly from hg.
> 
> I find it ironic that you were even willing to write tools to facilitate 
> your hg based gcc workflow.  That really shows what your thinking is on 
> this matter, in that you're willing to put effort towards making hg work 
> better for you but you're not willing to expend that level of effort to 
> see if git can do so as well.

While this is true...

> This is what really eats me from the inside about your dissatisfaction 
> with git.  Your analysis seems to be a self-fullfilling prophecy, and 
> that's totally unfair to both hg and git.

... I actually appreciate people complaining -- in the meantime.  It shows 
right away what group you belong to in the "Those who can do, do, those 
who can't, complain.".

You can see that very easily on the git list, or on the #git channel on 
irc.freenode.net.  There is enough data for a study which yearns to be 
written, that shows how quickly we resolve issues with people that are 
sincerely interested in a solution.

(Of course, on the other hand, there are also quite a few cases which show 
how frustrating (for both sides) and unfruitful discussions started by a 
complaint are.)

So I fully expect an issue like Daniel's to be resolved in a matter of 
minutes on the git list, if the OP gives us a chance.  If we are not even 
Cc'ed, you are completely right, she or he probably does not want the 
issue to be resolved.

Ciao,
Dscho



Re: Git and GCC

2007-12-06 Thread Ismail Dönmez
Thursday 06 December 2007 13:57:06 Johannes Schindelin yazmıştı:
[...]
> So I fully expect an issue like Daniel's to be resolved in a matter of
> minutes on the git list, if the OP gives us a chance.  If we are not even
> Cc'ed, you are completely right, she or he probably does not want the
> issue to be resolved.

Lets be fair about this, Ollie Wild already sent a mail about git-svn disk 
usage and there is no concrete solution yet, though it seems the bottleneck 
is known.

Regards,
ismail


-- 
Never learn by your mistakes, if you do you may never dare to try again.


[PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Johannes Schindelin

The default was not to change the window or depth at all.  As suggested
by Jon Smirl, Linus Torvalds and others, default to

--window=250 --depth=250

Signed-off-by: Johannes Schindelin <[EMAIL PROTECTED]>
---

On Wed, 5 Dec 2007, Linus Torvalds wrote:

> On Thu, 6 Dec 2007, Daniel Berlin wrote:
> > 
> > Actually, it turns out that git-gc --aggressive does this dumb 
> > thing to pack files sometimes regardless of whether you 
> > converted from an SVN repo or not.
> 
> Absolutely. git --aggressive is mostly dumb. It's really only 
> useful for the case of "I know I have a *really* bad pack, and I 
> want to throw away all the bad packing decisions I have done".
>
> [...]
> 
> So the equivalent of "git gc --aggressive" - but done *properly* 
> - is to do (overnight) something like
> 
>   git repack -a -d --depth=250 --window=250

How about this, then?

 builtin-gc.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/builtin-gc.c b/builtin-gc.c
index 799c263..c6806d3 100644
--- a/builtin-gc.c
+++ b/builtin-gc.c
@@ -23,7 +23,7 @@ static const char * const builtin_gc_usage[] = {
 };
 
 static int pack_refs = 1;
-static int aggressive_window = -1;
+static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 20;
 
@@ -192,6 +192,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
if (aggressive) {
append_option(argv_repack, "-f", MAX_ADD);
+   append_option(argv_repack, "--depth=250", MAX_ADD);
if (aggressive_window > 0) {
sprintf(buf, "--window=%d", aggressive_window);
append_option(argv_repack, buf, MAX_ADD);
-- 
1.5.3.7.2157.g9598e



Re: Git and GCC

2007-12-06 Thread Harvey Harrison

On Thu, 2007-12-06 at 10:52 +0100, Andreas Schwab wrote:
> Harvey Harrison <[EMAIL PROTECTED]> writes:
> 
> > git svn does accept a mailmap at import time with the same format as the
> > cvs importer I think.  But for someone that just wants a repo to check
> > out this was easiest.  I'd be willing to spend the time to do a nicer
> > job if there was any interest from the gcc side, but I'm not that
> > invested (other than owing them for an often-used tool).
> 
> I have a complete list of the uid<->mail mapping for the gcc repository.
> 
> Andreas.
> 

Feel free to send it along, but for now I'll keep on going without a
mapping.  If I went back now and changed it, all those people who
are already using the existing mirror will have to download a
whole new history.

If gcc decides they would like a more clean import for more official
use I'd be more than happy to work with you guys to produce a more
clean import with Author/commiter names cleaned up, etc.

Cheers,

Harvey



Re: update_stmt calls

2007-12-06 Thread Andrew MacLeod

Zdenek Dvorak wrote:

Hello,

during a recent discussion, it was pointed to my attention that
update_stmt is performance critical.  I wondered why; this is the number
of update_stmt calls for combine.i (all the other passes have less then
1000 calls):
  



<...>
  



I have a patch that decreases number of update_stmt calls in tree alias
analysis to 46525; still, is it really that useful to run pass_may_alias
*six* times during compilation?  Obviously, we need the initial one, and
there are comments after pass_sra and pass_fold_builtins that indicate
that the following pass_may_alias cannot be avoided (which seems
doubtful to me, at the very least in the later case), but the remaining
three seem to be just placed randomly.

I also have a patch that decreases the number of update_stmt calls
in VRP to 5229 (which is more or less the number of ASSERT_EXPRs it
creates, so this cannot be improved significantly).
  


I can't say I'm suprised, there was a time when the general rule was 
"when in doubt, update", so not a lot of thought has gone into those 
calls, especially in the older passes. Anything which reduces the calls 
to update_stmt() is a probably good thing :-)


I can't speak to the number of calls to pass_may_alias, but that does 
seem a bit excessive to me as well.  To the best of my knowledge, no one 
has recently (if ever) sat down and figured out which passes we actually 
need where and when. They just get added when people think they are a 
good idea, but rarely get removed later when things change. I would 
suspect there are numerous passes that could be eliminated with some 
analysis and shuffling.  Thats something that would be easier to do with 
a dynamic pass manager :-)


Andrew



Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Theodore Tso
On Thu, Dec 06, 2007 at 12:03:38PM +, Johannes Schindelin wrote:
> 
> The default was not to change the window or depth at all.  As suggested
> by Jon Smirl, Linus Torvalds and others, default to
> 
>   --window=250 --depth=250

I'd also suggest adding a comment in the man pages that this should
only be done rarely, and that it can potentially take a *long* time
(i.e., overnight) for big repositories, and in general it's not worth
the effort to use --aggressive.

Apologies to Linus and to the gcc folks, since I was the one who
originally coded up gc --aggressive, and at the time my intent was
"rarely does it make sense, and it may take a long time".  The reason
why I didn't make the default --window and --depth larger is because
at the time the biggest repo I had easy access to was the Linux
kernel's, and there you rapidly hit diminishing returns at much
smaller numbers, so there was no real point in using --window=250
--depth=250.

Linus later pointed out that what we *really* should do is at some
point was to change repack -f to potentially retry to find a better
delta, but to reuse the existing delta if it was no worse.  That
automatically does the right thing in the case where you had
previously done a repack with --window= --depth=,
but then later try using "gc --agressive", which ends up doing a worse
job and throwing away the information from the previous repack with
large window and depth sizes.  Unfortunately no one ever got around to
implementing that.

Regards,

- Ted


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Wed, 5 Dec 2007, Harvey Harrison wrote:

> 
> > git repack -a -d --depth=250 --window=250
> > 
> 
> Since I have the whole gcc repo locally I'll give this a shot overnight
> just to see what can be done at the extreme end or things.

Don't forget to add -f as well.


Nicolas


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jeff King wrote:

> On Thu, Dec 06, 2007 at 01:47:54AM -0500, Jon Smirl wrote:
> 
> > The key to converting repositories of this size is RAM. 4GB minimum,
> > more would be better. git-repack is not multi-threaded. There were a
> > few attempts at making it multi-threaded but none were too successful.
> > If I remember right, with loads of RAM, a repack on a 450MB repository
> > was taking about five hours on a 2.8Ghz Core2. But this is something
> > you only have to do once for the import. Later repacks will reuse the
> > original deltas.
> 
> Actually, Nicolas put quite a bit of work into multi-threading the
> repack process; the results have been in master for some time, and will
> be in the soon-to-be-released v1.5.4.
> 
> The downside is that the threading partitions the object space, so the
> resulting size is not necessarily as small (but I don't know that
> anybody has done testing on large repos to find out how large the
> difference is).

Quick guesstimate is in the 1% ballpark.


Nicolas


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Pierre Habouzit
On Thu, Dec 06, 2007 at 12:03:38PM +, Johannes Schindelin wrote:
> 
> The default was not to change the window or depth at all.  As suggested
> by Jon Smirl, Linus Torvalds and others, default to
> 
>   --window=250 --depth=250

  well, this will explode on many quite reasonnably sized systems. This
should also use a memory-limit that could be auto-guessed from the
system total physical memory (50% of the actual memory could be a good
idea e.g.).

  On very large repositories, using that on the e.g. linux kernel, swaps
like hell on a machine with 1Go of ram, and almost nothing running on it
(less than 200Mo of ram actually used)


pgpLVXjr8dTPE.pgp
Description: PGP signature


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Theodore Tso wrote:

> Linus later pointed out that what we *really* should do is at some
> point was to change repack -f to potentially retry to find a better
> delta, but to reuse the existing delta if it was no worse.  That
> automatically does the right thing in the case where you had
> previously done a repack with --window= --depth=,
> but then later try using "gc --agressive", which ends up doing a worse
> job and throwing away the information from the previous repack with
> large window and depth sizes.  Unfortunately no one ever got around to
> implementing that.

I did start looking at it, but there are subtle issues to consider, such 
as making sure not to create delta loops.  Currently this is avoided by 
never involving already reused deltas in new delta chains, except for 
edge base objects.

IOW, this requires some head scratching which I didn't have the time for 
so far.


Nicolas


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Harvey Harrison
Wow

/usr/bin/time git repack -a -d -f --window=250 --depth=250


23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps

-r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
-r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack


That extra delta depth really does make a difference.  Just over a
300MB pack in the end, for all gcc branches/tags as of last night.

Cheers,

Harvey



Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Johannes Schindelin
Hi,

On Thu, 6 Dec 2007, Pierre Habouzit wrote:

> On Thu, Dec 06, 2007 at 12:03:38PM +, Johannes Schindelin wrote:
> > 
> > The default was not to change the window or depth at all.  As 
> > suggested by Jon Smirl, Linus Torvalds and others, default to
> > 
> > --window=250 --depth=250
> 
>   well, this will explode on many quite reasonnably sized systems. This 
> should also use a memory-limit that could be auto-guessed from the 
> system total physical memory (50% of the actual memory could be a good 
> idea e.g.).
> 
>   On very large repositories, using that on the e.g. linux kernel, swaps 
> like hell on a machine with 1Go of ram, and almost nothing running on it 
> (less than 200Mo of ram actually used)

Yes.

However, I think that --aggressive should be aggressive, and if you decide 
to run it on a machine which lacks the muscle to be aggressive, well, you 
should have known better.

The upside: if you run this on a strong machine and clone it to a weak 
machine, you'll still have the benefit of a small pack (and you should 
mark it as .keep, too, to keep the benefit...)

Ciao,
Dscho



Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Johannes Schindelin
Hi,

On Thu, 6 Dec 2007, Harvey Harrison wrote:

> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

Wow.

Ciao,
Dscho


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, Harvey Harrison wrote:
> 
> 7:41:25elapsed 86%CPU

Heh. And this is why you want to do it exactly *once*, and then just 
export the end result for others ;)

> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

But yeah, especially if you allow longer delta chains, the end result can 
be much smaller (and what makes the one-time repack more expensive is the 
window size, not the delta chain - you could make the delta chains longer 
with no cost overhead at packing time)

HOWEVER. 

The longer delta chains do make it potentially much more expensive to then 
use old history. So there's a trade-off. And quite frankly, a delta depth 
of 250 is likely going to cause overflows in the delta cache (which is 
only 256 entries in size *and* it's a hash, so it's going to start having 
hash conflicts long before hitting the 250 depth limit).

So when I said "--depth=250 --window=250", I chose those numbers more as 
an example of extremely aggressive packing, and I'm not at all sure that 
the end result is necessarily wonderfully usable. It's going to save disk 
space (and network bandwidth - the delta's will be re-used for the network 
protocol too!), but there are definitely downsides too, and using long 
delta chains may simply not be worth it in practice.

(And some of it might just want to have git tuning, ie if people think 
that long deltas are worth it, we could easily just expand on the delta 
hash, at the cost of some more memory used!)

That said, the good news is that working with *new* history will not be 
affected negatively, and if you want to be _really_ sneaky, there are ways 
to say "create a pack that contains the history up to a version one year 
ago, and be very aggressive about those old versions that we still want to 
have around, but do a separate pack for newer stuff using less aggressive 
parameters"

So this is something that can be tweaked, although we don't really have 
any really nice interfaces for stuff like that (ie the git delta cache 
size is hardcoded in the sources and cannot be set in the config file, and 
the "pack old history more aggressively" involves some manual scripting 
and knowing how "git pack-objects" works rather than any nice simple 
command line switch).

So the thing to take away from this is:
 - git is certainly flexible as hell
 - .. but to get the full power you may need to tweak things
 - .. happily you really only need to have one person to do the tweaking, 
   and the tweaked end results will be available to others that do not 
   need to know/care.

And whether the difference between 320MB and 500MB is worth any really 
involved tweaking (considering the potential downsides), I really don't 
know. Only testing will tell.

Linus


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread David Kastrup
Johannes Schindelin <[EMAIL PROTECTED]> writes:

> However, I think that --aggressive should be aggressive, and if you
> decide to run it on a machine which lacks the muscle to be aggressive,
> well, you should have known better.

That's a rather cheap shot.  "you should have known better" than
expecting to be able to use a documented command and option because the
git developers happened to have a nicer machine...

_How_ is one supposed to have known better?

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jeff King wrote:

> On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote:
> 
> > > The downside is that the threading partitions the object space, so the
> > > resulting size is not necessarily as small (but I don't know that
> > > anybody has done testing on large repos to find out how large the
> > > difference is).
> > 
> > Quick guesstimate is in the 1% ballpark.
> 
> Fortunately, we now have numbers. Harvey Harrison reported repacking the
> gcc repo and getting these results:
> 
> > /usr/bin/time git repack -a -d -f --window=250 --depth=250
> >
> > 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 
> > 0maxresident)k
> > 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps
> >
> > -r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26 
> > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
> > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 
> > pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack
> 
> I tried the threaded repack with pack.threads = 3 on a dual-processor
> machine, and got:
> 
>   time git repack -a -d -f --window=250 --depth=250
> 
>   real309m59.849s
>   user377m43.948s
>   sys 8m23.319s
> 
>   -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
>   -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack
> 
> So it is about 5% bigger.

Right.  I should probably revisit that idea of finding deltas across 
partition boundaries to mitigate that loss.  And those partitions could 
be made coarser as well to reduce the number of such partition gaps 
(just increase the value of chunk_size on line 1648 in 
builtin-pack-objects.c).

> What is really disappointing is that we saved
> only about 20% of the time. I didn't sit around watching the stages, but
> my guess is that we spent a long time in the single threaded "writing
> objects" stage with a thrashing delta cache.

Maybe you should run the non threaded repack on the same machine to have 
a good comparison.  And if you have only 2 CPUs, you will have better 
performances with pack.threads = 2, otherwise there'll be wasteful task 
switching going on.

And of course, if the delta cache is being trashed, that might be due to 
the way the existing pack was previously packed.  Hence the current pack 
might impact object _access_ when repacking them.  So for a really 
really fair performance comparison, you'd have to preserve the original 
pack and swap it back before each repack attempt.


Nicolas


Re: Git and GCC

2007-12-06 Thread Daniel Berlin
On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Thu, 6 Dec 2007, Daniel Berlin wrote:
> >
> > Actually, it turns out that git-gc --aggressive does this dumb thing
> > to pack files sometimes regardless of whether you converted from an
> > SVN repo or not.
>
> Absolutely. git --aggressive is mostly dumb. It's really only useful for
> the case of "I know I have a *really* bad pack, and I want to throw away
> all the bad packing decisions I have done".
>
> To explain this, it's worth explaining (you are probably aware of it, but
> let me go through the basics anyway) how git delta-chains work, and how
> they are so different from most other systems.
>
I worked on Monotone and other systems that use object stores. for a
little while :)
In particular, I believe GIT's original object store was based on
Monotone, IIRC.

> In other SCM's, a delta-chain is generally fixed. It might be "forwards"
> or "backwards", and it might evolve a bit as you work with the repository,
> but generally it's a chain of changes to a single file represented as some
> kind of single SCM entity. In CVS, it's obviously the *,v file, and a lot
> of other systems do rather similar things.

>
> Git also does delta-chains, but it does them a lot more "loosely". There
> is no fixed entity. Delta's are generated against any random other version
> that git deems to be a good delta candidate (with various fairly
> successful heursitics), and there are absolutely no hard grouping rules.

Sure. SVN actually supports this (surprisingly), it just never happens
to choose delta bases that aren't related by ancestry.  (IE it would
have absolutely no problem with you using random other parts of the
repository as delta bases, and i've played with it before).

I actually advocated we move towards an object store model, as
ancestry can be a  crappy way of approximating similarity when you
have a lot of branches.

> So the equivalent of "git gc --aggressive" - but done *properly* - is to
> do (overnight) something like
>
> git repack -a -d --depth=250 --window=250
>
I gave this a try overnight, and it definitely helps a lot.
Thanks!

> And then it's going to take forever and a day (ie a "do it overnight"
> thing). But the end result is that everybody downstream from that
> repository will get much better packs, without having to spend any effort
> on it themselves.
>

If your forever and a day is spent figuring out which deltas to use,
you can reduce this significantly.
If it is spent writing out the data, it's much harder. :)


Re: Git and GCC

2007-12-06 Thread Ian Lance Taylor
NightStrike <[EMAIL PROTECTED]> writes:

> On 12/5/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> > As I said, maybe i'll look at git in another year or so.
> > But  i'm certainly going to ignore all the "git is so great, we should
> > move gcc to it" people until it works better, while i am much more
> > inclined to believe the "hg is so great, we should move gc to it"
> > people.
> 
> Just out of curiosity, is there something wrong with the current
> choice of svn?  As I recall, it wasn't too long ago that gcc converted
> from cvs to svn.  What's the motivation to change again?  (I'm not
> trying to oppose anything.. I'm just curious, as I don't know much
> about this kind of thing).

Distributed version systems like git or Mercurial have some advantages
over Subversion.  For example, it is easy for developers to produce
patches which can be reliably committed or exchanged with other
developers.  With Subversion, we send around patch files generated by
diff and applied with patch.  This works, but is inconvenient, and
there is no way to track them.

With regard to git, I think it's worth noting that it was initially
designed to solve the problems faced by one man, Linus Torvalds.  The
problems he faces are not the problems which gcc developers face.  Our
development process is not the Linux kernel development process.  Of
course, many people have worked on git, and I expect that git can do
what we need.


For any git proponents, I'm curious to hear what advantages it offers
over Mercurial.  From this thread, one advantage of Mercurial seems
clear: it is easier to understand how to use it correctly.

Ian


Re: Git and GCC

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, Jeff King wrote:
> 
> What is really disappointing is that we saved only about 20% of the 
> time. I didn't sit around watching the stages, but my guess is that we 
> spent a long time in the single threaded "writing objects" stage with a 
> thrashing delta cache.

I don't think you spent all that much time writing the objects. That part 
isn't very intensive, it's mostly about the IO.

I suspect you may simply be dominated by memory-throughput issues. The 
delta matching doesn't cache all that well, and using two or more cores 
isn't going to help all that much if they are largely waiting for memory 
(and quite possibly also perhaps fighting each other for a shared cache? 
Is this a Core 2 with the shared L2?)

Linus


Re: Git and GCC

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, Daniel Berlin wrote:
>
> I worked on Monotone and other systems that use object stores. for a 
> little while :) In particular, I believe GIT's original object store was 
> based on Monotone, IIRC.

Yes and no. 

Monotone does what git does for the blobs. But there is a big difference 
in how git then does it for everything else too, ie trees and history. 
Tree being in that object store in particular are very important, and one 
of the biggest deals for deltas (actually, for two reasons: most of the 
time they don't change AT ALL if some subdirectory gets no changes and you 
don't need any delta, and even when they do change, it's usually going to 
delta very well, since it's usually just a small part that changes).

> > And then it's going to take forever and a day (ie a "do it overnight"
> > thing). But the end result is that everybody downstream from that
> > repository will get much better packs, without having to spend any effort
> > on it themselves.
> 
> If your forever and a day is spent figuring out which deltas to use,
> you can reduce this significantly.

It's almost all about figuring out the delta. Which is why *not* using 
"-f" (or "--aggressive") is such a big deal for normal operation, because 
then you just skip it all.

Linus


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Thu, 6 Dec 2007, Jeff King wrote:
> >
> > What is really disappointing is that we saved only about 20% of the
> > time. I didn't sit around watching the stages, but my guess is that we
> > spent a long time in the single threaded "writing objects" stage with a
> > thrashing delta cache.
>
> I don't think you spent all that much time writing the objects. That part
> isn't very intensive, it's mostly about the IO.
>
> I suspect you may simply be dominated by memory-throughput issues. The
> delta matching doesn't cache all that well, and using two or more cores
> isn't going to help all that much if they are largely waiting for memory
> (and quite possibly also perhaps fighting each other for a shared cache?
> Is this a Core 2 with the shared L2?)

When I lasted looked at the code, the problem was in evenly dividing
the work. I was using a four core machine and most of the time one
core would end up with 3-5x the work of the lightest loaded core.
Setting pack.threads up to 20 fixed the problem. With a high number of
threads I was able to get a 4hr pack to finished in something like
1:15.

A scheme where each core could work a minute without communicating to
the other cores would be best. It would also be more efficient if the
cores could avoid having sync points between them.

-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 09:18:39AM -0500, Nicolas Pitre wrote:

> > The downside is that the threading partitions the object space, so the
> > resulting size is not necessarily as small (but I don't know that
> > anybody has done testing on large repos to find out how large the
> > difference is).
> 
> Quick guesstimate is in the 1% ballpark.

Fortunately, we now have numbers. Harvey Harrison reported repacking the
gcc repo and getting these results:

> /usr/bin/time git repack -a -d -f --window=250 --depth=250
>
> 23266.37user 581.04system 7:41:25elapsed 86%CPU (0avgtext+0avgdata 
> 0maxresident)k
> 0inputs+0outputs (419835major+123275804minor)pagefaults 0swaps
>
> -r--r--r-- 1 hharrison hharrison  29091872 2007-12-06 07:26 
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.idx
> -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 
> pack-1d46ca030c3d6d6b95ad316deb922be06b167a3d.pack

I tried the threaded repack with pack.threads = 3 on a dual-processor
machine, and got:

  time git repack -a -d -f --window=250 --depth=250

  real309m59.849s
  user377m43.948s
  sys 8m23.319s

  -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 
pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
  -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 
pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack

So it is about 5% bigger. What is really disappointing is that we saved
only about 20% of the time. I didn't sit around watching the stages, but
my guess is that we spent a long time in the single threaded "writing
objects" stage with a thrashing delta cache.

-Peff


Re: [PATCH] gc --aggressive: make it really aggressive

2007-12-06 Thread J.C. Pizarro
On 2007/12/06, David Kastrup <[EMAIL PROTECTED]> wrote:
> Johannes Schindelin <[EMAIL PROTECTED]> writes:
>
> > However, I think that --aggressive should be aggressive, and if you
> > decide to run it on a machine which lacks the muscle to be aggressive,
> > well, you should have known better.
>
> That's a rather cheap shot.  "you should have known better" than
> expecting to be able to use a documented command and option because the
> git developers happened to have a nicer machine...
>
> _How_ is one supposed to have known better?
>
> --
> David Kastrup, Kriemhildstr. 15, 44793 Bochum

In GIT, the --aggressive option doesn't make it aggressive.
In GCC, the -Wall option doesn't enable all warnings.

   #
It's a "Tie one to one" with the similar reputations.   ###
 To have a rest in peace.  #
   #
   J.C.Pizarro #


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> >
> >
> > On Thu, 6 Dec 2007, Jeff King wrote:
> > >
> > > What is really disappointing is that we saved only about 20% of the
> > > time. I didn't sit around watching the stages, but my guess is that we
> > > spent a long time in the single threaded "writing objects" stage with a
> > > thrashing delta cache.
> >
> > I don't think you spent all that much time writing the objects. That part
> > isn't very intensive, it's mostly about the IO.
> >
> > I suspect you may simply be dominated by memory-throughput issues. The
> > delta matching doesn't cache all that well, and using two or more cores
> > isn't going to help all that much if they are largely waiting for memory
> > (and quite possibly also perhaps fighting each other for a shared cache?
> > Is this a Core 2 with the shared L2?)
> 
> When I lasted looked at the code, the problem was in evenly dividing
> the work. I was using a four core machine and most of the time one
> core would end up with 3-5x the work of the lightest loaded core.
> Setting pack.threads up to 20 fixed the problem. With a high number of
> threads I was able to get a 4hr pack to finished in something like
> 1:15.

But as far as I know you didn't try my latest incarnation which has been
available in Git's master branch for a few months already.


Nicolas


Re: Git and GCC

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, NightStrike wrote:
> 
> No disrespect is meant by this reply.  I am just curious (and I am
> probably misunderstanding something)..  Why remove all of the
> documentation entirely?  Wouldn't it be better to just document it
> more thoroughly?

Well, part of it is that I don't think "--aggressive" as it is implemented 
right now is really almost *ever* the right answer. We could change the 
implementation, of course, but generally the right thing to do is to not 
use it (tweaking the "--window" and "--depth" manually for the repacking 
is likely the more natural thing to do).

The other part of the answer is that, when you *do* want to do what that 
"--aggressive" tries to achieve, it's such a special case event that while 
it should probably be documented, I don't think it should necessarily be 
documented where it is now (as part of "git gc"), but as part of a much 
more technical manual for "deep and subtle tricks you can play".

> I thought you did a fine job in this post in explaining its purpose, 
> when to use it, when not to, etc.  Removing the documention seems 
> counter-intuitive when you've already gone to the trouble of creating 
> good documentation here in this post.

I'm so used to writing emails, and I *like* trying to explain what is 
going on, so I have no problems at all doing that kind of thing. However, 
trying to write a manual or man-page or other technical documentation is 
something rather different.

IOW, I like explaining git within the _context_ of a discussion or a 
particular problem/issue. But documentation should work regardless of 
context (or at least set it up), and that's the part I am not so good at.

In other words, if somebody (hint hint) thinks my explanation was good and 
readable, I'd love for them to try to turn it into real documentation by 
editing it up and creating enough context for it! But I'm nort personally 
very likely to do that. I'd just send Junio the patch to remove a 
misleading part of the documentation we have.

Linus


Re: Git and GCC

2007-12-06 Thread Jon Loeliger
On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote:

> Git also does delta-chains, but it does them a lot more "loosely". There 
> is no fixed entity. Delta's are generated against any random other version 
> that git deems to be a good delta candidate (with various fairly 
> successful heursitics), and there are absolutely no hard grouping rules.

I'd like to learn more about that.  Can someone point me to
either more documentation on it?  In the absence of that,
perhaps a pointer to the source code that implements it?

I guess one question I posit is, would it be more accurate
to think of this as a "delta net" in a weighted graph rather
than a "delta chain"?

Thanks,
jdl




Re: Git and GCC

2007-12-06 Thread NightStrike
On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Thu, 6 Dec 2007, Daniel Berlin wrote:
> >
> > Actually, it turns out that git-gc --aggressive does this dumb thing
> > to pack files sometimes regardless of whether you converted from an
> > SVN repo or not.
> I'll send a patch to Junio to just remove the "git gc --aggressive"
> documentation. It can be useful, but it generally is useful only when you
> really understand at a very deep level what it's doing, and that
> documentation doesn't help you do that.

No disrespect is meant by this reply.  I am just curious (and I am
probably misunderstanding something)..  Why remove all of the
documentation entirely?  Wouldn't it be better to just document it
more thoroughly?  I thought you did a fine job in this post in
explaining its purpose, when to use it, when not to, etc.  Removing
the documention seems counter-intuitive when you've already gone to
the trouble of creating good documentation here in this post.


Re: Git and GCC. Why not with fork, exec and pipes like in linux?

2007-12-06 Thread J.C. Pizarro
On 2007/12/06, "Jon Smirl" <[EMAIL PROTECTED]> wrote:
> On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > On Thu, 6 Dec 2007, Jeff King wrote:
> > >
> > > What is really disappointing is that we saved only about 20% of the
> > > time. I didn't sit around watching the stages, but my guess is that we
> > > spent a long time in the single threaded "writing objects" stage with a
> > > thrashing delta cache.
> >
> > I don't think you spent all that much time writing the objects. That part
> > isn't very intensive, it's mostly about the IO.
> >
> > I suspect you may simply be dominated by memory-throughput issues. The
> > delta matching doesn't cache all that well, and using two or more cores
> > isn't going to help all that much if they are largely waiting for memory
> > (and quite possibly also perhaps fighting each other for a shared cache?
> > Is this a Core 2 with the shared L2?)
>
> When I lasted looked at the code, the problem was in evenly dividing
> the work. I was using a four core machine and most of the time one
> core would end up with 3-5x the work of the lightest loaded core.
> Setting pack.threads up to 20 fixed the problem. With a high number of
> threads I was able to get a 4hr pack to finished in something like
> 1:15.
>
> A scheme where each core could work a minute without communicating to
> the other cores would be best. It would also be more efficient if the
> cores could avoid having sync points between them.
>
> --
> Jon Smirl
> [EMAIL PROTECTED]

For multicores CPUs, don't divide the work in threads.
To divide the work in processes!

Tips, tricks and hacks: to use fork, exec, pipes and another IPC mechanisms like
mutexes, shared memory's IPC, file locks, pipes, semaphores, RPCs, sockets, etc.
to access concurrently and parallely to the filelocked database.

For Intel Quad Core e.g., x4 cores, it need a parent process and 4
child processes
linked to the parent with pipes.

The parent process can be
* no-threaded using select/epoll/libevent
* threaded using Pth (GNU Portable Threads), NPTL (from RedHat) or whatever.

   J.C.Pizarro


Re: Git and GCC

2007-12-06 Thread Vincent Lefevre
On 2007-12-06 10:15:17 -0800, Ian Lance Taylor wrote:
> Distributed version systems like git or Mercurial have some advantages
> over Subversion.

It's surprising that you don't mention svk, which is based on top
of Subversion[*]. Has anyone tried? Is there any problem with it?

[*] You have currently an obvious advantage here.

-- 
Vincent Lefèvre <[EMAIL PROTECTED]> - Web: 
100% accessible validated (X)HTML - Blog: 
Work: CR INRIA - computer arithmetic / Arenaire project (LIP, ENS-Lyon)


Re: Git and GCC

2007-12-06 Thread Ismail Dönmez
Thursday 06 December 2007 21:28:59 Vincent Lefevre yazmıştı:
> On 2007-12-06 10:15:17 -0800, Ian Lance Taylor wrote:
> > Distributed version systems like git or Mercurial have some advantages
> > over Subversion.
>
> It's surprising that you don't mention svk, which is based on top
> of Subversion[*]. Has anyone tried? Is there any problem with it?
>
> [*] You have currently an obvious advantage here.

Last time I tried SVK it was slow and buggy. I wouldn't recommend it.

/ismail

-- 
Never learn by your mistakes, if you do you may never dare to try again.


Re: Git and GCC

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, Jon Loeliger wrote:
>
> On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote:
> > Git also does delta-chains, but it does them a lot more "loosely". There 
> > is no fixed entity. Delta's are generated against any random other version 
> > that git deems to be a good delta candidate (with various fairly 
> > successful heursitics), and there are absolutely no hard grouping rules.
> 
> I'd like to learn more about that.  Can someone point me to
> either more documentation on it?  In the absence of that,
> perhaps a pointer to the source code that implements it?

Well, in a very real sense, what the delta code does is:
 - just list every single object in the whole repository
 - walk over each object, trying to find another object that it can be 
   written as a delta against
 - write out the result as a pack-file

That's simplified: we may not walk _all_ objects, for example: only a 
global repack does that (and most pack creations are actually for pushign 
and pulling between two repositories, so we only walk the objects that are 
in the source but not the destination repository).

The interesting phase is the "walk each object, try to find a delta" part. 
In particular, you don't want to try to find a delta by comparing each 
object to every other object out there (that would be O(n^2) in objects, 
and with a fairly high constant cost too!). So what it does is to sort the 
objects by a few heuristics (type of object, base name that object was 
found as when traversing a tree and size, and how recently it was found in 
the history).

And then over that sorted list, it tries to find deltas between entries 
that are "close" to each other (and that's where the "--window=xyz" thing 
comes in - it says how big the window is for objects being close. A 
smaller window generates somewhat less good deltas, but takes a lot less 
effort to generate).

The source is in git/builtin-pack-objects.c, with the core of it being

 - try_delta() - try to generate a *single* delta when given an object 
   pair.

 - find_deltas() - do the actual list traversal

 - prepare_pack() and type_size_sort() - create the delta sort list from 
   the list of objects.

but that whole file is probably some of the more opaque parts of git.

> I guess one question I posit is, would it be more accurate
> to think of this as a "delta net" in a weighted graph rather
> than a "delta chain"?

It's certainly not a simple chain, it's more of a set of acyclic directed 
graphs in the object list. And yes, it's weigted by the size of the delta 
between objects, and the optimization problem is kind of akin to finding 
the smallest spanning tree (well, forest - since you do *not* want to 
create one large graph, you also want to make the individual trees shallow 
enough that you don't have excessive delta depth).

There are good algorithms for finding minimum spanning trees, but this one 
is complicated by the fact that the biggest cost (by far!) is the 
calculation of the weights itself. So rather than really worry about 
finding the minimal tree/forest, the code needs to worry about not having 
to even calculate all the weights!

(That, btw, is a common theme. A lot of git is about traversing graphs, 
like the revision graph. And most of the trivial graph problems all assume 
that you have the whole graph, but since the "whole graph" is the whole 
history of the repository, those algorithms are totally worthless, since 
they are fundamentally much too expensive - if we have to generate the 
whole history, we're already screwed for a big project. So things like 
revision graph calculation, the main performance issue is to avoid having 
to even *look* at parts of the graph that we don't need to see!)

Linus


Re: Git and GCC

2007-12-06 Thread Junio C Hamano
Jon Loeliger <[EMAIL PROTECTED]> writes:

> On Thu, 2007-12-06 at 00:09, Linus Torvalds wrote:
>
>> Git also does delta-chains, but it does them a lot more "loosely". There 
>> is no fixed entity. Delta's are generated against any random other version 
>> that git deems to be a good delta candidate (with various fairly 
>> successful heursitics), and there are absolutely no hard grouping rules.
>
> I'd like to learn more about that.  Can someone point me to
> either more documentation on it?  In the absence of that,
> perhaps a pointer to the source code that implements it?

See Documentation/technical/pack-heuristics.txt,
but the document predates and does not talk about delta
reusing, which was covered here:

http://thread.gmane.org/gmane.comp.version-control.git/16223/focus=16267

> I guess one question I posit is, would it be more accurate
> to think of this as a "delta net" in a weighted graph rather
> than a "delta chain"?

Yes.


Re: Git and GCC

2007-12-06 Thread Andrey Belevantsev

Vincent Lefevre wrote:

It's surprising that you don't mention svk, which is based on top
of Subversion[*]. Has anyone tried? Is there any problem with it?
I must agree with Ismail's reply here.  We have used svk for our 
internal development for about two years, for the reason of easy 
mirroring of gcc trunk and branching from it locally.  I would not 
complain about its speed, but sometimes we had problems with merge from 
trunk, ending up with e.g. zero-sized files in our branch which were 
removed from trunk, or we even couldn't merge at all, and I had to 
resort to underlying subversion repository for merging.  As a result, 
we're currently migrating to mercurial.


Andrey


Re: Git and GCC. Why not with fork, exec and pipes like in linux?

2007-12-06 Thread J.C. Pizarro
On 2007/12/6, J.C. Pizarro <[EMAIL PROTECTED]>, i wrote:
> For multicores CPUs, don't divide the work in threads.
> To divide the work in processes!
>
> Tips, tricks and hacks: to use fork, exec, pipes and another IPC mechanisms 
> like
> mutexes, shared memory's IPC, file locks, pipes, semaphores, RPCs, sockets, 
> etc.
> to access concurrently and parallely to the filelocked database.

I'm sorry, we don't need exec. We need fork, pipes and another IPC mechanisms
because it so shares easy the C code for parallelism.

Thanks to Linus because GIT is implemented in C language to interact with
system calls of the kernel written in C.

> For Intel Quad Core e.g., x4 cores, it need a parent process and 4
> child processes linked to the parent with pipes.

For peak performance (e.g 99.9% usage), the minimum number of child
processes should be more than 4, normally between e.g. 6 and 10 processes
depending on the statistics of idle's stalls of the cores.

> The parent process can be
> * no-threaded using select/epoll/libevent
> * threaded using Pth (GNU Portable Threads), NPTL (from RedHat) or whatever.

Note: there is a little design's problem with slowdown of I/O bandwith when
the parent is multithreaded and the children MUST to be multithreaded that
we can't avoid them to be non-multithreaded for maximum I/O bandwith.

The "finding of the smallest spanning forest with deltas" consumes a lot of
CPU, so if it scales well in a CPU x4 cores then it can to reduce 4
hours to 1 hour.

   J.C.Pizarro :)


Re: Git and GCC

2007-12-06 Thread Daniel Berlin
On 12/6/07, Andrey Belevantsev <[EMAIL PROTECTED]> wrote:
> Vincent Lefevre wrote:
> > It's surprising that you don't mention svk, which is based on top
> > of Subversion[*]. Has anyone tried? Is there any problem with it?
> I must agree with Ismail's reply here.  We have used svk for our
> internal development for about two years, for the reason of easy
> mirroring of gcc trunk and branching from it locally.  I would not
> complain about its speed, but sometimes we had problems with merge from
> trunk, ending up with e.g. zero-sized files in our branch which were
> removed from trunk, or we even couldn't merge at all, and I had to
> resort to underlying subversion repository for merging.  As a result,
> we're currently migrating to mercurial.

I would not recommend SVK either (even being an SVN committer). While
i love the SVK guys to death, it's just not the way to go if you want
a distributed system.

>
> Andrey
>


Re: Git and GCC

2007-12-06 Thread Junio C Hamano
Junio C Hamano <[EMAIL PROTECTED]> writes:

> Jon Loeliger <[EMAIL PROTECTED]> writes:
>
>> I'd like to learn more about that.  Can someone point me to
>> either more documentation on it?  In the absence of that,
>> perhaps a pointer to the source code that implements it?
>
> See Documentation/technical/pack-heuristics.txt,

A somewhat funny thing about this is ...

$ git show --stat --summary b116b297
commit b116b297a80b54632256eb89dd22ea2b140de622
Author: Jon Loeliger <[EMAIL PROTECTED]>
Date:   Thu Mar 2 19:19:29 2006 -0600

Added Packing Heursitics IRC writeup.

Signed-off-by: Jon Loeliger <[EMAIL PROTECTED]>
Signed-off-by: Junio C Hamano <[EMAIL PROTECTED]>

 Documentation/technical/pack-heuristics.txt |  466 +++
 1 files changed, 466 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/pack-heuristics.txt


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > When I lasted looked at the code, the problem was in evenly dividing
> > the work. I was using a four core machine and most of the time one
> > core would end up with 3-5x the work of the lightest loaded core.
> > Setting pack.threads up to 20 fixed the problem. With a high number of
> > threads I was able to get a 4hr pack to finished in something like
> > 1:15.
>
> But as far as I know you didn't try my latest incarnation which has been
> available in Git's master branch for a few months already.

I've deleted all my giant packs. Using the kernel pack:
4GB Q6600

Using the current thread pack code I get these results.

The interesting case is the last one. I set it to 15 threads and
monitored with 'top'.
For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
74-100% was 100% CPU. It never used all for cores. The only other
things running were top and my desktop. This is the same load
balancing problem I observed earlier. Much more clock time was spent
in the 2/1 core phases than the 3 core one.

Threaded, threads = 5

[EMAIL PROTECTED]:/home/linux$ time git repack -a -d -f
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 528994), reused 0 (delta 0)

real1m31.395s
user2m59.239s
sys 0m3.048s
[EMAIL PROTECTED]:/home/linux$

12 seconds counting
53 seconds compressing
38 seconds writing

Without threads,

[EMAIL PROTECTED]:/home/linux$ time git repack -a -d -f
warning: no threads support, ignoring pack.threads
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 528999), reused 0 (delta 0)

real2m54.849s
user2m51.267s
sys 0m1.412s
[EMAIL PROTECTED]:/home/linux$

Threaded, threads = 5

[EMAIL PROTECTED]:/home/linux$ time git repack -a -d -f --depth=250 --window=250
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 539080), reused 0 (delta 0)

real9m18.032s
user19m7.484s
sys 0m3.880s
[EMAIL PROTECTED]:/home/linux$

[EMAIL PROTECTED]:/home/linux/.git/objects/pack$ ls -l
total 182156
-r--r--r-- 1 jonsmirl jonsmirl  15561848 2007-12-06 16:15
pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.idx
-r--r--r-- 1 jonsmirl jonsmirl 170768761 2007-12-06 16:15
pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.pack
[EMAIL PROTECTED]:/home/linux/.git/objects/pack$

Non-threaded:

[EMAIL PROTECTED]:/home/linux$ time git repack -a -d -f --depth=250 --window=250
warning: no threads support, ignoring pack.threads
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 539080), reused 0 (delta 0)

real18m51.183s
user18m46.538s
sys 0m1.604s
[EMAIL PROTECTED]:/home/linux$


[EMAIL PROTECTED]:/home/linux/.git/objects/pack$ ls -l
total 182156
-r--r--r-- 1 jonsmirl jonsmirl  15561848 2007-12-06 15:33
pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.idx
-r--r--r-- 1 jonsmirl jonsmirl 170768761 2007-12-06 15:33
pack-f1f8637d2c68eb1c964ec7c1877196c0c7513412.pack
[EMAIL PROTECTED]:/home/linux/.git/objects/pack$

Threaded, threads = 15

[EMAIL PROTECTED]:/home/linux$ time git repack -a -d -f --depth=250 --window=250
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 539080), reused 0 (delta 0)

real9m18.325s
user19m14.340s
sys 0m3.996s
[EMAIL PROTECTED]:/home/linux$

-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > When I lasted looked at the code, the problem was in evenly dividing
> > > the work. I was using a four core machine and most of the time one
> > > core would end up with 3-5x the work of the lightest loaded core.
> > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > threads I was able to get a 4hr pack to finished in something like
> > > 1:15.
> >
> > But as far as I know you didn't try my latest incarnation which has been
> > available in Git's master branch for a few months already.
> 
> I've deleted all my giant packs. Using the kernel pack:
> 4GB Q6600
> 
> Using the current thread pack code I get these results.
> 
> The interesting case is the last one. I set it to 15 threads and
> monitored with 'top'.
> For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> 74-100% was 100% CPU. It never used all for cores. The only other
> things running were top and my desktop. This is the same load
> balancing problem I observed earlier.

Well, that's possible with a window 25 times larger than the default.

The load balancing is solved with a master thread serving relatively 
small object list segments to any work thread that finished with its 
previous segment.  But the size for those segments is currently fixed to 
window * 1000 which is way too large when window == 250.

I have to find a way to auto-tune that segment size somehow.

But with the default window size there should not be any such noticeable 
load balancing problem.

Note that threading only happens in the compression phase.  The count 
and write phase are hardly paralleled.


Nicolas


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
>
> > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > > When I lasted looked at the code, the problem was in evenly dividing
> > > > the work. I was using a four core machine and most of the time one
> > > > core would end up with 3-5x the work of the lightest loaded core.
> > > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > > threads I was able to get a 4hr pack to finished in something like
> > > > 1:15.
> > >
> > > But as far as I know you didn't try my latest incarnation which has been
> > > available in Git's master branch for a few months already.
> >
> > I've deleted all my giant packs. Using the kernel pack:
> > 4GB Q6600
> >
> > Using the current thread pack code I get these results.
> >
> > The interesting case is the last one. I set it to 15 threads and
> > monitored with 'top'.
> > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> > 74-100% was 100% CPU. It never used all for cores. The only other
> > things running were top and my desktop. This is the same load
> > balancing problem I observed earlier.
>
> Well, that's possible with a window 25 times larger than the default.

Why did it never use more than three cores?

>
> The load balancing is solved with a master thread serving relatively
> small object list segments to any work thread that finished with its
> previous segment.  But the size for those segments is currently fixed to
> window * 1000 which is way too large when window == 250.
>
> I have to find a way to auto-tune that segment size somehow.
>
> But with the default window size there should not be any such noticeable
> load balancing problem.
>
> Note that threading only happens in the compression phase.  The count
> and write phase are hardly paralleled.
>
>
> Nicolas
>


-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> > > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > > > When I lasted looked at the code, the problem was in evenly dividing
> > > > > the work. I was using a four core machine and most of the time one
> > > > > core would end up with 3-5x the work of the lightest loaded core.
> > > > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > > > threads I was able to get a 4hr pack to finished in something like
> > > > > 1:15.
> > > >
> > > > But as far as I know you didn't try my latest incarnation which has been
> > > > available in Git's master branch for a few months already.
> > >
> > > I've deleted all my giant packs. Using the kernel pack:
> > > 4GB Q6600
> > >
> > > Using the current thread pack code I get these results.
> > >
> > > The interesting case is the last one. I set it to 15 threads and
> > > monitored with 'top'.
> > > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> > > 74-100% was 100% CPU. It never used all for cores. The only other
> > > things running were top and my desktop. This is the same load
> > > balancing problem I observed earlier.
> >
> > Well, that's possible with a window 25 times larger than the default.
> 
> Why did it never use more than three cores?

You have 648366 objects total, and only 647457 of them are subject to 
delta compression.

With a window size of 250 and a default thread segment of window * 1000 
that means only 3 segments will be distributed to threads, hence only 3 
threads with work to do.


Nicolas


Re: Git and GCC

2007-12-06 Thread David Kastrup
Junio C Hamano <[EMAIL PROTECTED]> writes:

> Junio C Hamano <[EMAIL PROTECTED]> writes:
>
>> Jon Loeliger <[EMAIL PROTECTED]> writes:
>>
>>> I'd like to learn more about that.  Can someone point me to
>>> either more documentation on it?  In the absence of that,
>>> perhaps a pointer to the source code that implements it?
>>
>> See Documentation/technical/pack-heuristics.txt,
>
> A somewhat funny thing about this is ...
>
> $ git show --stat --summary b116b297
> commit b116b297a80b54632256eb89dd22ea2b140de622
> Author: Jon Loeliger <[EMAIL PROTECTED]>
> Date:   Thu Mar 2 19:19:29 2006 -0600
>
> Added Packing Heursitics IRC writeup.

Ah, fishing for compliments.  The cookie baking season...

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> On Thu, 6 Dec 2007, Jon Smirl wrote:
>
> > On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > > When I lasted looked at the code, the problem was in evenly dividing
> > > > the work. I was using a four core machine and most of the time one
> > > > core would end up with 3-5x the work of the lightest loaded core.
> > > > Setting pack.threads up to 20 fixed the problem. With a high number of
> > > > threads I was able to get a 4hr pack to finished in something like
> > > > 1:15.
> > >
> > > But as far as I know you didn't try my latest incarnation which has been
> > > available in Git's master branch for a few months already.
> >
> > I've deleted all my giant packs. Using the kernel pack:
> > 4GB Q6600
> >
> > Using the current thread pack code I get these results.
> >
> > The interesting case is the last one. I set it to 15 threads and
> > monitored with 'top'.
> > For 0-60% compression I was at 300% CPU, 60-74% was 200% CPU and
> > 74-100% was 100% CPU. It never used all for cores. The only other
> > things running were top and my desktop. This is the same load
> > balancing problem I observed earlier.
>
> Well, that's possible with a window 25 times larger than the default.
>
> The load balancing is solved with a master thread serving relatively
> small object list segments to any work thread that finished with its
> previous segment.  But the size for those segments is currently fixed to
> window * 1000 which is way too large when window == 250.
>
> I have to find a way to auto-tune that segment size somehow.

That would be nice. Threading is most important on the giant
pack/window combinations. The normal case is fast enough that I don't
real notice it. These giant pack/window combos can run 8-10 hours.

>
> But with the default window size there should not be any such noticeable
> load balancing problem.

I only spend 30 seconds in the compression phase without making the
window larger. It's not long enough to really see what is going on.

>
> Note that threading only happens in the compression phase.  The count
> and write phase are hardly paralleled.
>
>
> Nicolas
>


-- 
Jon Smirl
[EMAIL PROTECTED]


[OT] Re: Git and GCC

2007-12-06 Thread Randy Dunlap
On Thu, 06 Dec 2007 23:26:07 +0100 David Kastrup wrote:

> Junio C Hamano <[EMAIL PROTECTED]> writes:
> 
> > Junio C Hamano <[EMAIL PROTECTED]> writes:
> >
> >> Jon Loeliger <[EMAIL PROTECTED]> writes:
> >>
> >>> I'd like to learn more about that.  Can someone point me to
> >>> either more documentation on it?  In the absence of that,
> >>> perhaps a pointer to the source code that implements it?
> >>
> >> See Documentation/technical/pack-heuristics.txt,
> >
> > A somewhat funny thing about this is ...
> >
> > $ git show --stat --summary b116b297
> > commit b116b297a80b54632256eb89dd22ea2b140de622
> > Author: Jon Loeliger <[EMAIL PROTECTED]>
> > Date:   Thu Mar 2 19:19:29 2006 -0600
> >
> > Added Packing Heursitics IRC writeup.
> 
> Ah, fishing for compliments.  The cookie baking season...

Indeed.  Here are some really good & sweet recipes (IMHO).

http://www.xenotime.net/linux/recipes/


---
~Randy
Features and documentation: http://lwn.net/Articles/260136/


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/6/07, Nicolas Pitre <[EMAIL PROTECTED]> wrote:
> > > Well, that's possible with a window 25 times larger than the default.
> >
> > Why did it never use more than three cores?
>
> You have 648366 objects total, and only 647457 of them are subject to
> delta compression.
>
> With a window size of 250 and a default thread segment of window * 1000
> that means only 3 segments will be distributed to threads, hence only 3
> threads with work to do.

One little tweak and the clock time drops from 9.5 to 6 minutes. The
tweak makes all four cores work.

[EMAIL PROTECTED]:/home/apps/git$ git diff
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..e0dd12e 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -1645,7 +1645,7 @@ static void ll_find_deltas(struct object_entry
**list, unsigned list_size,
}

/* this should be auto-tuned somehow */
-   chunk_size = window * 1000;
+   chunk_size = window * 50;

do {
unsigned sublist_size = chunk_size;


[EMAIL PROTECTED]:/home/linux/.git$ time git repack -a -d -f --depth=250
--window=250
Counting objects: 648366, done.
Compressing objects: 100% (647457/647457), done.
Writing objects: 100% (648366/648366), done.
Total 648366 (delta 539043), reused 0 (delta 0)

real6m2.109s
user20m0.491s
sys 0m4.608s
[EMAIL PROTECTED]:/home/linux/.git$



>
>
> Nicolas
>


-- 
Jon Smirl
[EMAIL PROTECTED]


Help with the Machine Description

2007-12-06 Thread Balaji V. Iyer
Hello Everyone,
I am trying to modify the OpenRISC GCC to modify the existing
instructions and add more instructions into the system. I had to rewrite
most of the or32.md. When I am trying to compile something, it says the
following constaint is not found. Can someone please help me with
reading this contraint correctly?
 
(insn 112 110 478 12 (set (mem:QI (reg/v/f:SI 16 r16 [orig:72 line.183 ]
[72]) [0 S1 A8])
(const_int 0 [0x0])) 16 {movqi} (nil)
(nil))

>From what I see, it is just a that we are trying to set 1 byte of a
memory location with the value in register #16 (r16) with an offset of
0which I have handled already in my machine description...so what
can this be?
 
Any help is highly appreciated.
 
Thanking You,
 
Yours Sincerely,
 
Balaji V. Iyer.
 
-- 
 
Balaji V. Iyer
PhD Student, 
Center for Efficient, Scalable and Reliable Computing,
Department of Electrical and Computer Engineering,
North Carolina State University.




Re: Git and GCC

2007-12-06 Thread Jakub Narebski
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Thu, 6 Dec 2007, Jon Loeliger wrote:

>> I guess one question I posit is, would it be more accurate
>> to think of this as a "delta net" in a weighted graph rather
>> than a "delta chain"?
> 
> It's certainly not a simple chain, it's more of a set of acyclic directed 
> graphs in the object list. And yes, it's weigted by the size of the delta 
> between objects, and the optimization problem is kind of akin to finding 
> the smallest spanning tree (well, forest - since you do *not* want to 
> create one large graph, you also want to make the individual trees shallow 
> enough that you don't have excessive delta depth).
> 
> There are good algorithms for finding minimum spanning trees, but this one 
> is complicated by the fact that the biggest cost (by far!) is the 
> calculation of the weights itself. So rather than really worry about 
> finding the minimal tree/forest, the code needs to worry about not having 
> to even calculate all the weights!
> 
> (That, btw, is a common theme. A lot of git is about traversing graphs, 
> like the revision graph. And most of the trivial graph problems all assume 
> that you have the whole graph, but since the "whole graph" is the whole 
> history of the repository, those algorithms are totally worthless, since 
> they are fundamentally much too expensive - if we have to generate the 
> whole history, we're already screwed for a big project. So things like 
> revision graph calculation, the main performance issue is to avoid having 
> to even *look* at parts of the graph that we don't need to see!)

Hmmm...

I think that these two problems (find minimal spanning forest with
limited depth and traverse graph) with the additional constraint to
avoid calculating weights / avoid calculating whole graph would be
a good problem to present at CompSci course.

Just a thought...
-- 
Jakub Narebski
Poland
ShadeHawk on #git


In future, to replace autotools by cmake like KDE4 did?

2007-12-06 Thread J.C. Pizarro
The autotools ( automake + libtool + autoconf + ... ) generate many big
files that they have been slowing the building's computation and growing
enormously their cvs/svn/git/hg repositories because of generated files.

To see below interesting links:
1. http://dot.kde.org/1172083974/
2. http://sam.zoy.org/lectures/20050910-debian/
3. https://lwn.net/Articles/188693/
4. http://en.wikipedia.org/wiki/GNU_Build_Tools
5. http://en.wikipedia.org/wiki/GNU_Automake

The benefits could be:
* +40% faster in the KDE4 building vs KDE 3.5.6.
* elimination of redundant and unnecesary generated files as those
  from autotools.
* smaller cvs/svn/git/hg repositories.
* less errors/crashes when it's configuring.
* can be improved the cmake's sources for better performance's gain.
* good and long maintainance life.

I hope if the files for cmake+make can be well integrated in GCC 4.4

   J.C.Pizarro


Re: Git and GCC

2007-12-06 Thread Harvey Harrison
On Thu, 2007-12-06 at 13:04 -0500, Daniel Berlin wrote:
> On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> > So the equivalent of "git gc --aggressive" - but done *properly* - is to
> > do (overnight) something like
> >
> > git repack -a -d --depth=250 --window=250
> >
> I gave this a try overnight, and it definitely helps a lot.
> Thanks!

I've updated the public mirror repo with the very-packed version.

People cloning it now should get the just over 300MB repo now.

git.infradead.org/gcc.git


Cheers,

Harvey



Re: Git and GCC

2007-12-06 Thread David Miller
From: Jeff King <[EMAIL PROTECTED]>
Date: Thu, 6 Dec 2007 12:39:47 -0500

> I tried the threaded repack with pack.threads = 3 on a dual-processor
> machine, and got:
> 
>   time git repack -a -d -f --window=250 --depth=250
> 
>   real309m59.849s
>   user377m43.948s
>   sys 8m23.319s
> 
>   -r--r--r-- 1 peff peff  28570088 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.idx
>   -r--r--r-- 1 peff peff 339922573 2007-12-06 10:11 
> pack-1fa336f33126d762988ed6fc3f44ecbe0209da3c.pack
> 
> So it is about 5% bigger. What is really disappointing is that we saved
> only about 20% of the time. I didn't sit around watching the stages, but
> my guess is that we spent a long time in the single threaded "writing
> objects" stage with a thrashing delta cache.

If someone can give me a good way to run this test case I can
have my 64-cpu Niagara-2 box crunch on this and see how fast
it goes and how much larger the resulting pack file is.


Re: Git and GCC

2007-12-06 Thread Nicolas Pitre
On Thu, 6 Dec 2007, Jon Smirl wrote:

> I have a 4.8GB git process with 4GB of physical memory. Everything
> started slowing down a lot when the process got that big. Does git
> really need 4.8GB to repack? I could only keep 3.4GB resident. Luckily
> this happen at 95% completion. With 8GB of memory you should be able
> to do this repack in under 20 minutes.

Probably you have too many cached delta results.  By default, every 
delta smaller than 1000 bytes is kept in memory until the write phase.  
Try using pack.deltacachesize = 256M or lower, or try disabling this 
caching entirely with pack.deltacachelimit = 0.


Nicolas


Re: Help with the Machine Description

2007-12-06 Thread Revital1 Eres
Hello,

I think you should look at the constraint of the instruction in your md
file, for example (taken from altivec.md file under config/rs6000 dir):

(define_insn "altivec_stvx"
  [(parallel
[(set (match_operand:V4SI 0 "memory_operand" "=Z")
  (match_operand:V4SI 1 "register_operand" "v"))
 (unspec [(const_int 0)] UNSPEC_STVX)])]
  "TARGET_ALTIVEC"
  "stvx %1,%y0"
  [(set_attr "type" "vecstore")])

The v and Z indicate constraints on the operands of the instruction.
Their description can be found in constraints.md file in the same dir::

(define_memory_constraint "Z"
  "Indexed or indirect memory operand"
  (match_operand 0 "indexed_or_indirect_operand"))

You can take a look at the gcc internals for more info about this.

Revital

[EMAIL PROTECTED] wrote on 07/12/2007 00:52:38:

> Hello Everyone,
> I am trying to modify the OpenRISC GCC to modify the existing
> instructions and add more instructions into the system. I had to rewrite
> most of the or32.md. When I am trying to compile something, it says the
> following constaint is not found. Can someone please help me with
> reading this contraint correctly?
>
> (insn 112 110 478 12 (set (mem:QI (reg/v/f:SI 16 r16 [orig:72 line.183 ]
> [72]) [0 S1 A8])
> (const_int 0 [0x0])) 16 {movqi} (nil)
> (nil))
>
> From what I see, it is just a that we are trying to set 1 byte of a
> memory location with the value in register #16 (r16) with an offset of
> 0which I have handled already in my machine description...so what
> can this be?
>
> Any help is highly appreciated.
>
> Thanking You,
>
> Yours Sincerely,
>
> Balaji V. Iyer.
>
> --
>
> Balaji V. Iyer
> PhD Student,
> Center for Efficient, Scalable and Reliable Computing,
> Department of Electrical and Computer Engineering,
> North Carolina State University.
>
>



Re: Git and GCC

2007-12-06 Thread NightStrike
On 12/6/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Thu, 6 Dec 2007, NightStrike wrote:
> >
> > No disrespect is meant by this reply.  I am just curious (and I am
> > probably misunderstanding something)..  Why remove all of the
> > documentation entirely?  Wouldn't it be better to just document it
> > more thoroughly?
>
> Well, part of it is that I don't think "--aggressive" as it is implemented
> right now is really almost *ever* the right answer. We could change the
> implementation, of course, but generally the right thing to do is to not
> use it (tweaking the "--window" and "--depth" manually for the repacking
> is likely the more natural thing to do).
>
> The other part of the answer is that, when you *do* want to do what that
> "--aggressive" tries to achieve, it's such a special case event that while
> it should probably be documented, I don't think it should necessarily be
> documented where it is now (as part of "git gc"), but as part of a much
> more technical manual for "deep and subtle tricks you can play".
>
> > I thought you did a fine job in this post in explaining its purpose,
> > when to use it, when not to, etc.  Removing the documention seems
> > counter-intuitive when you've already gone to the trouble of creating
> > good documentation here in this post.
>
> I'm so used to writing emails, and I *like* trying to explain what is
> going on, so I have no problems at all doing that kind of thing. However,
> trying to write a manual or man-page or other technical documentation is
> something rather different.
>
> IOW, I like explaining git within the _context_ of a discussion or a
> particular problem/issue. But documentation should work regardless of
> context (or at least set it up), and that's the part I am not so good at.
>
> In other words, if somebody (hint hint) thinks my explanation was good and
> readable, I'd love for them to try to turn it into real documentation by
> editing it up and creating enough context for it! But I'm nort personally
> very likely to do that. I'd just send Junio the patch to remove a
> misleading part of the documentation we have.

hehe.. I'd love to, actually.  I can work on it next week.


Re: Git and GCC

2007-12-06 Thread Linus Torvalds


On Thu, 6 Dec 2007, Jon Smirl wrote:
> >
> > time git blame -C gcc/regclass.c > /dev/null
> 
> [EMAIL PROTECTED]:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> 
> real1m21.967s
> user1m21.329s

Well, I was also hoping for a "compared to not-so-aggressive packing" 
number on the same machine.. IOW, what I was wondering is whether there is 
a visible performance downside to the deeper delta chains in the 300MB 
pack vs the (less aggressive) 500MB pack.

Linus


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 01:02:58PM -0500, Nicolas Pitre wrote:

> > What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> Maybe you should run the non threaded repack on the same machine to have 
> a good comparison.

Sorry, I should have been more clear. By "saved" I meant "we needed N
minutes of CPU time, but took only M minutes of real time to use it."
IOW, if we assume that the threading had zero overhead and that we were
completely CPU bound, then the task would have taken N minutes of real
time. And obviously those assumptions aren't true, but I was attempting
to say "it would have been at most N minutes of real time to do it
single-threaded."

> And if you have only 2 CPUs, you will have better performances with
> pack.threads = 2, otherwise there'll be wasteful task switching going
> on.

Yes, but balanced by one thread running out of data way earlier than the
other, and completing the task with only one CPU. I am doing a 4-thread
test on a quad-CPU right now, and I will also try it with threads=1 and
threads=6 for comparison.

> And of course, if the delta cache is being trashed, that might be due to 
> the way the existing pack was previously packed.  Hence the current pack 
> might impact object _access_ when repacking them.  So for a really 
> really fair performance comparison, you'd have to preserve the original 
> pack and swap it back before each repack attempt.

I am working each time from the pack generated by fetching from
git://git.infradead.org/gcc.git.

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote:

> > So it is about 5% bigger. What is really disappointing is that we saved
> > only about 20% of the time. I didn't sit around watching the stages, but
> > my guess is that we spent a long time in the single threaded "writing
> > objects" stage with a thrashing delta cache.
> 
> If someone can give me a good way to run this test case I can
> have my 64-cpu Niagara-2 box crunch on this and see how fast
> it goes and how much larger the resulting pack file is.

That would be fun to see. The procedure I am using is this:

# compile recent git master with threaded delta
cd git
echo THREADED_DELTA_SEARCH = 1 >>config.mak
make install

# get the gcc pack
mkdir gcc && cd gcc
git --bare init
git config remote.gcc.url git://git.infradead.org/gcc.git
git config remote.gcc.fetch \
  '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*'
git remote update

# make a copy, so we can run further tests from a known point
cd ..
cp -a gcc test

# and test multithreaded large depth/window repacking
cd test
git config pack.threads 4
time git repack -a -d -f --window=250 --depth=250

-Peff


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/7/07, Linus Torvalds <[EMAIL PROTECTED]> wrote:
>
>
> On Thu, 6 Dec 2007, Jon Smirl wrote:
> > >
> > > time git blame -C gcc/regclass.c > /dev/null
> >
> > [EMAIL PROTECTED]:/video/gcc$ time git blame -C gcc/regclass.c > /dev/null
> >
> > real1m21.967s
> > user1m21.329s
>
> Well, I was also hoping for a "compared to not-so-aggressive packing"
> number on the same machine.. IOW, what I was wondering is whether there is
> a visible performance downside to the deeper delta chains in the 300MB
> pack vs the (less aggressive) 500MB pack.

Same machine with a default pack

[EMAIL PROTECTED]:/video/gcc/.git/objects/pack$ ls -l
total 2145716
-r--r--r-- 1 jonsmirl jonsmirl   23667932 2007-12-07 02:03
pack-bd163555ea9240a7fdd07d2708a293872665f48b.idx
-r--r--r-- 1 jonsmirl jonsmirl 2171385413 2007-12-07 02:03
pack-bd163555ea9240a7fdd07d2708a293872665f48b.pack
[EMAIL PROTECTED]:/video/gcc/.git/objects/pack$

Delta lengths have virtually no impact. The bigger pack file causes
more IO which offsets the increased delta processing time.

One of my rules is smaller is almost always better. Smaller eliminates
IO and helps with the CPU cache. It's like the kernel being optimized
for size instead of speed ending up being  faster.

time git blame -C gcc/regclass.c > /dev/null
real1m19.289s
user1m17.853s
sys 0m0.952s



>
> Linus
>


-- 
Jon Smirl
[EMAIL PROTECTED]


Re: Git and GCC

2007-12-06 Thread Jeff King
On Thu, Dec 06, 2007 at 10:35:22AM -0800, Linus Torvalds wrote:

> > What is really disappointing is that we saved only about 20% of the 
> > time. I didn't sit around watching the stages, but my guess is that we 
> > spent a long time in the single threaded "writing objects" stage with a 
> > thrashing delta cache.
> 
> I don't think you spent all that much time writing the objects. That part 
> isn't very intensive, it's mostly about the IO.

It can get nasty with super-long deltas thrashing the cache, I think.
But in this case, I think it ended up being just a poor division of
labor caused by the chunk_size parameter using the quite large window
size (see elsewhere in the thread for discussion).

> I suspect you may simply be dominated by memory-throughput issues. The 
> delta matching doesn't cache all that well, and using two or more cores 
> isn't going to help all that much if they are largely waiting for memory 
> (and quite possibly also perhaps fighting each other for a shared cache? 
> Is this a Core 2 with the shared L2?)

I think the chunk_size more or less explains it. I have had reasonable
success keeping both CPUs busy on similar tasks in the past (but with
smaller window sizes).

For reference, it was a Core 2 Duo; do they all share L2, or is there
something I can look for in /proc/cpuinfo?

-Peff


Re: Git and GCC

2007-12-06 Thread Jeff King
On Fri, Dec 07, 2007 at 01:50:47AM -0500, Jeff King wrote:

> Yes, but balanced by one thread running out of data way earlier than the
> other, and completing the task with only one CPU. I am doing a 4-thread
> test on a quad-CPU right now, and I will also try it with threads=1 and
> threads=6 for comparison.

Hmm. As this has been running, I read the rest of the thread, and it
looks like Jon Smirl has already posted the interesting numbers. So
nevermind, unless there is something particular you would like to see.

-Peff


Re: Git and GCC

2007-12-06 Thread Jon Smirl
On 12/7/07, Jeff King <[EMAIL PROTECTED]> wrote:
> On Thu, Dec 06, 2007 at 07:31:21PM -0800, David Miller wrote:
>
> > > So it is about 5% bigger. What is really disappointing is that we saved
> > > only about 20% of the time. I didn't sit around watching the stages, but
> > > my guess is that we spent a long time in the single threaded "writing
> > > objects" stage with a thrashing delta cache.
> >
> > If someone can give me a good way to run this test case I can
> > have my 64-cpu Niagara-2 box crunch on this and see how fast
> > it goes and how much larger the resulting pack file is.
>
> That would be fun to see. The procedure I am using is this:
>
> # compile recent git master with threaded delta
> cd git
> echo THREADED_DELTA_SEARCH = 1 >>config.mak
> make install
>
> # get the gcc pack
> mkdir gcc && cd gcc
> git --bare init
> git config remote.gcc.url git://git.infradead.org/gcc.git
> git config remote.gcc.fetch \
>   '+refs/remotes/gcc.gnu.org/*:refs/remotes/gcc.gnu.org/*'
> git remote update
>
> # make a copy, so we can run further tests from a known point
> cd ..
> cp -a gcc test
>
> # and test multithreaded large depth/window repacking
> cd test
> git config pack.threads 4

64 threads with 64 CPUs, if they are multicore you want even more.
you need to adjust chunk_size as mentioned in the other mail.


> time git repack -a -d -f --window=250 --depth=250
>
> -Peff
>


-- 
Jon Smirl
[EMAIL PROTECTED]


Re: In future, to replace autotools by cmake like KDE4 did?

2007-12-06 Thread Marcel Holtmann

Hi,

The autotools ( automake + libtool + autoconf + ... ) generate many  
big
files that they have been slowing the building's computation and  
growing
enormously their cvs/svn/git/hg repositories because of generated  
files.


To see below interesting links:
1. http://dot.kde.org/1172083974/
2. http://sam.zoy.org/lectures/20050910-debian/
3. https://lwn.net/Articles/188693/
4. http://en.wikipedia.org/wiki/GNU_Build_Tools
5. http://en.wikipedia.org/wiki/GNU_Automake

The benefits could be:
* +40% faster in the KDE4 building vs KDE 3.5.6.
* elimination of redundant and unnecesary generated files as those
 from autotools.
* smaller cvs/svn/git/hg repositories.


stop spreading this FUD. If you leave the auto-generated files from  
autotools in the source control repositories, then it is your fault.  
They are generated files and can always be generated. Hence putting  
them under revision control makes no sense and so don't do it. And  
more certain don't complain about it if you did.


Regards

Marcel