sf wrote:
The point is that when you have 100,000s of records, this grep becomes
really slow?
There are performance bugs with current versions of grep
and multibyte characters that are only getting addressed now.
To work around these do `export LANG=C` first.
In my experience grep is not scalable since it's O(n^2).
See below (note A and B are randomized versions of
/usr/share/dict/words (and therefore worst case for the
sort method)).
$ wc -l A B
45427 A
45427 B
$ export LANG=C
$ time grep -Fvf B A
real 0m0.437s
$ time sort A B B | uniq -u
real 0m0.262s
$ rpm -q grep coreutils
grep-2.5.1-16.1
coreutils-4.5.3-19
--
Pádraig Brady - http://www.pixelbeat.org
--
--
http://mail.python.org/mailman/listinfo/python-list