bug#22357: grep -f not only huge memory usage, but also huge time cost

Paul Eggert Mon, 19 Dec 2016 15:39:50 -0800

On 12/19/2016 02:38 PM, Norihiro Tanaka wrote:

BTW, What case do you the worst? One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .

Well, perhaps I misunderstood it, but the old 'replace' called 'delete'up to N times, and 'delete' took O(log N) to find the deleted item plusO(N) to move later, non-deleted items.

Anyway, I verified that the change improved performance on the test casewith -f /usr/share/dict/words (not as much as your earlier patchimproved things! just 5% to 10%, if I recall correctly), so even if I'mwrong about the big-O improvement the change should still be a win.

1. dfa is not optimize parsed patterns.
     For example, dfa does not replace pattern 'a \|a \|a \| ... \|a \|b'
     to 'a \|b'.

     I tried it several times, have not succeeded it yet.


That sounds a bit like REduce:

Valgenti VC, Kim MS, Oh S-I, Lee I. REduce: removing redundancy fromregular expression matching in network security. ICCCN 2015.http://dx.doi.org/10.1109/ICCCN.2015.7288457


  2. grep matcher compiles regex always to check syntax of a pattern,
     even when it is not used in searching.

     I also tried it, have not succeeded it yet.

Yes, that sounds worthy too. Also, a lot of work has been done inregular-expression matches over the past several years, and if someonehas the time it'd be nice to see whether GNU grep can use this stuff onmodern architectures. For example:

Cameron RD, Medforth N, Lin D, Denis D, Sumner WN. Bitwise dataparallelism with LLVM: the icGrep case study. ICA3PP 2015. LNCS 9529.http://dx.doi.org/10.1007/978-3-319-27122-4_26

Valgenti VC, Chhugani J, Sun Y, Satish N, Kim MS, Kim C, Dubey P.GPP-grep: high-speed regular expression processing engine on generalpurpose processors. RAID 2012. LNCS 7462.http://dx.doi.org/10.1007/978-3-642-33338-5_17

bug#22357: grep -f not only huge memory usage, but also huge time cost

Reply via email to