On 12/19/2016 02:38 PM, Norihiro Tanaka wrote:
BTW, What case do you the worst? One more I think previous 'replace' is
not O(N*(N + log N)) but O(N + N log N) i.e. O(N log N) .
Well, perhaps I misunderstood it, but the old 'replace' called 'delete'
up to N times, and 'delete' took O(log N) to find the deleted item plus
O(N) to move later, non-deleted items.
Anyway, I verified that the change improved performance on the test case
with -f /usr/share/dict/words (not as much as your earlier patch
improved things! just 5% to 10%, if I recall correctly), so even if I'm
wrong about the big-O improvement the change should still be a win.
1. dfa is not optimize parsed patterns.
For example, dfa does not replace pattern 'a \|a \|a \| ... \|a \|b'
to 'a \|b'.
I tried it several times, have not succeeded it yet.
That sounds a bit like REduce:
Valgenti VC, Kim MS, Oh S-I, Lee I. REduce: removing redundancy from
regular expression matching in network security. ICCCN 2015.
http://dx.doi.org/10.1109/ICCCN.2015.7288457
2. grep matcher compiles regex always to check syntax of a pattern,
even when it is not used in searching.
I also tried it, have not succeeded it yet.
Yes, that sounds worthy too. Also, a lot of work has been done in
regular-expression matches over the past several years, and if someone
has the time it'd be nice to see whether GNU grep can use this stuff on
modern architectures. For example:
Cameron RD, Medforth N, Lin D, Denis D, Sumner WN. Bitwise data
parallelism with LLVM: the icGrep case study. ICA3PP 2015. LNCS 9529.
http://dx.doi.org/10.1007/978-3-319-27122-4_26
Valgenti VC, Chhugani J, Sun Y, Satish N, Kim MS, Kim C, Dubey P.
GPP-grep: high-speed regular expression processing engine on general
purpose processors. RAID 2012. LNCS 7462.
http://dx.doi.org/10.1007/978-3-642-33338-5_17