Re: Git and GCC

2007-12-06 Thread Jakub Narebski
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Thu, 6 Dec 2007, Jon Loeliger wrote:

>> I guess one question I posit is, would it be more accurate
>> to think of this as a "delta net" in a weighted graph rather
>> than a "delta chain"?
> 
> It's certainly not a simple chain, it's more of a set of acyclic directed 
> graphs in the object list. And yes, it's weigted by the size of the delta 
> between objects, and the optimization problem is kind of akin to finding 
> the smallest spanning tree (well, forest - since you do *not* want to 
> create one large graph, you also want to make the individual trees shallow 
> enough that you don't have excessive delta depth).
> 
> There are good algorithms for finding minimum spanning trees, but this one 
> is complicated by the fact that the biggest cost (by far!) is the 
> calculation of the weights itself. So rather than really worry about 
> finding the minimal tree/forest, the code needs to worry about not having 
> to even calculate all the weights!
> 
> (That, btw, is a common theme. A lot of git is about traversing graphs, 
> like the revision graph. And most of the trivial graph problems all assume 
> that you have the whole graph, but since the "whole graph" is the whole 
> history of the repository, those algorithms are totally worthless, since 
> they are fundamentally much too expensive - if we have to generate the 
> whole history, we're already screwed for a big project. So things like 
> revision graph calculation, the main performance issue is to avoid having 
> to even *look* at parts of the graph that we don't need to see!)

Hmmm...

I think that these two problems (find minimal spanning forest with
limited depth and traverse graph) with the additional constraint to
avoid calculating weights / avoid calculating whole graph would be
a good problem to present at CompSci course.

Just a thought...
-- 
Jakub Narebski
Poland
ShadeHawk on #git


Re: In future, to replace autotools by cmake like KDE4 did?

2007-12-07 Thread Jakub Narebski
"J.C. Pizarro" <[EMAIL PROTECTED]> writes:

> The autotools ( automake + libtool + autoconf + ... ) generate many big
> files that they have been slowing the building's computation and growing
> enormously their cvs/svn/git/hg repositories because of generated files.
[cut]

And this is relevant for this mailing list exactly how? From the whole
autotools package git uses only autoconf, and only as an optional part
to configure only Makefile configuration variables.

Generated files should not be put into version control, unless it is
for convenience only in separate branch like HTML and manpage versions
of git documentation are in 'html and 'man' branches, respectively.
The same could be done with ./configure script.

Although there was some talk about whether giw should use autotools,
or perhaps CMake, or handmade ./configure script like MPlayer IIRC,
instead of its own handmade Makefile...

-- 
Jakub Narebski
ShadeHawk on #git


Re: In future, to replace autotools by cmake like KDE4 did?

2007-12-07 Thread Jakub Narebski
Andreas Ericsson wrote:
> Jakub Narebski wrote:
> > 
> > Although there was some talk about whether giw should use autotools,
> > or perhaps CMake, or handmade ./configure script like MPlayer IIRC,
> > instead of its own handmade Makefile...
> > 
> 
> To tell the truth, I'd be much happier if everything like that got
> put in a header file or some such. 95% of what we figure out by looking
> at "uname" output can already be learned by looking at the various
> pre-defined macros.
> 
> Fortunately, there's a project devoted solely to this, so most of
> the tedious research need not be done. It can be found at
> http://predef.sourceforge.net/

Code talks, bullsh*t walks.

Pre-defined macros cannot tell us if one have specific libraries
installed, cannot tell us if formatted IO functions support 'size
specifiers' even though compiler claim C99 compliance or even though
compiler doesn't claim C99 compliance but supports this, etc.

But perhaps the "uname" based compile configuration could be replaced
by testing pre-defined macros... at least for C code, and git is not
only C code.

-- 
Jakub Narebski
Poland


Re: Git and GCC

2007-12-07 Thread Jakub Narebski
Giovanni Bajo <[EMAIL PROTECTED]> writes:

> On 12/7/2007 6:23 PM, Linus Torvalds wrote:
> 
> >> Is SHA a significant portion of the compute during these repacks?
> >> I should run oprofile...
> > SHA1 is almost totally insignificant on x86. It hardly shows up. But
> > we have a good optimized version there.
> > zlib tends to be a lot more noticeable (especially the
> > *uncompression*: it may be faster than compression, but it's done _so_
> > much more that it totally dominates).
> 
> Have you considered alternatives, like:
> http://www.oberhumer.com/opensource/ucl/


  As compared to LZO, the UCL algorithms achieve a better compression
  ratio but *decompression* is a little bit slower. See below for some
  rough timings.


It is uncompression speed that is more important, because it is used
much more often.

-- 
Jakub Narebski
ShadeHawk on #git



Re: Something is broken in repack

2007-12-13 Thread Jakub Narebski
Johannes Sixt wrote:
> Paolo Bonzini schrieb:
>> Nguyen Thai Ngoc Duy wrote:
>>>
>>> Is there an alternative to "git repack -a -d" that repacks everything
>>> but the first pack?
>> 
>> That would be a pretty good idea for big repositories.  If I were to
>> implement it, I would actually add a .git/config option like
>> pack.permanent so that more than one pack could be made permanent; then
>> to repack really really everything you'd need "git repack -a -a -d".
> 
> It's already there: If you have a pack .git/objects/pack/pack-foo.pack, then
> "touch .git/objects/pack/pack-foo.keep" marks the pack as precious.

Actually you can (and probably should) put the one line with the _reason_
pack is to be kept in the *.keep file.

Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
all things.
-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git




Re: Something is broken in repack

2007-12-14 Thread Jakub Narebski
"Nguyen Thai Ngoc Duy" <[EMAIL PROTECTED]> writes:

> On Dec 14, 2007 1:14 PM, Paolo Bonzini <[EMAIL PROTECTED]> wrote:
> > > Hmmm... it is even documented in git-gc(1)... and git-index-pack(1) of
> > > all things.
> >
> > I found that the .keep file is not transmitted over the network (at
> > least I tried with git+ssh:// and http:// protocols), however.
> 
> I'm thinking about "git clone --keep" to mark initial packs precious.
> But 'git clone' is under rewrite to C. Let's wait until C rewrite is
> done.

But if you clone via network, pack might be network optimized if you
use "smart" transport, not disk optimized, at least with current git
which regenerates pack also on clone AFAIK.

-- 
Jakub Narebski
Poland
ShadeHawk on #git