bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-19 Thread Paul Eggert
Norihiro Tanaka wrote: Sorry, I forgot to send the patch. We need the patch to optimize MERGE function to speed-up for some cases. Thanks, that improved the performance of the 'grep -vf linux.words linux.words' benchmark from (before the recent changes) real 8.06 user 6.20 sys 1.85 to (after

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-19 Thread Norihiro Tanaka
On Tue, 18 Sep 2018 22:13:38 -0700 Jim Meyering wrote: > Also, when I compared grep compiled at > 123620af88f55c3e0cc9f0aed7311c72f625bc82 (latest, including your > changes) and that compiled at the prior commit, > 9c11510507ebcd31671f10d9b88532f8e6657ad2, I find that the new version > takes ove

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-19 Thread Paul Eggert
Jim Meyering wrote: seq 1 | sed 's/$/ abcdefghijklmnopqrstuvwxyz/; s/$/./' >in time -p env LC_ALL=C grep -vf in in (before) real 63.43 user 61.67 system 1.65 (after) real 1.64 user 1.61 system 0.03 # If we do not add '.' at last in pattern, not dfa but kwset is used. grep

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Jim Meyering
On Mon, Sep 17, 2018 at 7:16 AM Norihiro Tanaka wrote: > Even when similer states exists in alternation, DFA treats them as > separate items. It may complicate the transition in NFA and cause > slowdown. This change assembles the states into one. > > For example, ab|ac is changed into a(b|c). >

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Paul Eggert
Norihiro Tanaka wrote: I thought of various things for the name of the function, but I could not think of a good name. Yes, that's a tough one. I eventually came up with maybe_disable_superset_dfa. Perhaps we can think of an ever better one. I installed your patches into Gnulib, along with t

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Eric Blake
On 9/18/18 11:15 AM, Paul Jackson wrote: Norihiro Tanaka wrote: It means "No use superset for utf8". Would a comment be useful, such as in the following, (which may have to be reworded to be correct) ? /* Optimize DFA, but don't use superset (noss) for utf8 */ static void -dfaoptimize (

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Paul Jackson
Norihiro Tanaka wrote: >> It means "No use superset for utf8". Would a comment be useful, such as in the following, (which may have to be reworded to be correct) ? /* Optimize DFA, but don't use superset (noss) for utf8 */ static void -dfaoptimize (struct dfa *d) +dfautf8noss (struct dfa *d)

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-18 Thread Norihiro Tanaka
Paul Eggert wrote: > Thanks for the patch. A quick question: what does the identifier > "dfautf8noss" stand for? I couldn't figure it out. It means "No use superset for utf8". I thought of various things for the name of the function, but I could not think of a good name.

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-17 Thread Paul Eggert
Thanks for the patch. A quick question: what does the identifier "dfautf8noss" stand for? I couldn't figure it out.

bug#32750: [PATCH 2/2] dfa: optmization of alternation in NFA

2018-09-17 Thread Norihiro Tanaka
Hi, Even when similer states exists in alternation, DFA treats them as separate items. It may complicate the transition in NFA and cause slowdown. This change assembles the states into one. For example, ab|ac is changed into a(b|c). This change speeds-up matching for many branched pattern. Fo