Re: Efficient grep using Python?

P Fri, 17 Dec 2004 06:25:40 -0800

sf wrote:

The point is that when you have 100,000s of records, this grep becomes
really slow?


There are performance bugs with current versions of grep
and multibyte characters that are only getting addressed now.
To work around these do `export LANG=C` first.

In my experience grep is not scalable since it's O(n^2).
See below (note A and B are randomized versions of
/usr/share/dict/words (and therefore worst case for the
sort method)).

$ wc -l A B
  45427 A
  45427 B

$ export LANG=C

$ time grep -Fvf B A
real    0m0.437s

$ time sort A B B | uniq -u
real    0m0.262s

$ rpm -q grep coreutils
grep-2.5.1-16.1
coreutils-4.5.3-19

--
Pádraig Brady - http://www.pixelbeat.org
--
--
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient grep using Python?

Reply via email to