> > unfortunately, the last i checked, gnu grep mallocs > > for each byte of input when using a utf-8 locale. > > that bug was fixed in gnu grep years ago, > probably before you found and reported it. > unfortunately, linux distributions were for > many years not updating their copies of > gnu grep to the latest version, so very few > '/bin/grep's had the bug fix.
if i recall correctly, i found that in 2004 or 2005 and fixed it directly from the gnu.org source. perhaps you remember something i don't. in any event, it's still not really fixed. utf-8 performance still sucks: ; grep --version >[2=1] | sed 1q GNU grep 2.5.4 ; time grep missingstring1 mail.tar 0.00u 0.00s 0.01r grep missingstring1 mail.tar # status=1 LANG=en_US.UTF-8 time grep missingstring1 mail.tar 0.44u 0.00s 0.53r grep missingstring1 mail.tar # status=1 - erik