On Thu, Oct 04, 2007 at 09:06:10PM +1000, Graham Williams wrote:

> I've not been able to repeat. For example:
> 
> $ time wajig new
> [27 packages]
> real  0m0.996s
> user  0m0.784s
> sys   0m0.336s
> $
> 
> Wajig is essentially doing the same as you suggest. I save the list of
> packages in ~/.wajig/Available and ~/.wajig/Available.prv. I extract
> (in python though) those in one and not in the other. Then list the
> results, getting the descriptions from dpkg and print them.
> 
> Do you repeatably get the 1 minute execution?

Yes, I do. If I run 'top' while doing the 'wajig new', it looks like all
of the CPU time is spent in a single egrep process. However, doing an
strace reveals that it is simply running

  egrep ^(Package|Status|Version):

which seems reasonable.  However, this is quite curious:

$ time egrep '^(Package|Status|Version)' \
  </var/lib/dpkg/available \
  >~/egrep.out
real    1m21.106s
user    1m19.765s
sys     0m0.244s

$ time perl -ne 'print if /^(Package|Status|Version)/' \
  </var/lib/dpkg/available \
  >perl.out
real    0m0.734s
user    0m0.628s
sys     0m0.096s

$ cmp perl.out egrep.out && echo same
same

IOW, egrep takes 110 times as long to produce the same output as perl.
So it looks like this is actually a serious performance regression in
egrep. Downgrading to 2.5.1.ds2-6 is much better (although it still
takes 3.2 seconds ot perl's .734!).

So clearly this is an egrep bug, and should be reassigned. Although now
'wajig new' still takes 3.3 seconds, which is a little sluggish. It
looks like the data passes through a mish-mash of shell tools, which we
can probably speed up a bit. Any interest in a patch?

-Peff



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to