Hi, A case that dfa is slower than regex on bug-gawk mailing list is discussed 4 months ago.
http://lists.gnu.org/archive/html/bug-gawk/2017-07/msg00002.html Although gawk is fixed to not use dfa in some cases, it looks like a temporary treatment. (11d4ea518166ffbc0c2fe85d090723e8f299486c) -------- $ env LC_ALL=C ./gawk --version GNU Awk 4.2.0, API: 2.0 (GNU MPFR 3.1.5, GNU MP 4.3.2) Copyright (C) 1989, 1991-2017 Free Software Foundation. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/. $ time -p env LC_ALL=C ./gawk -f test.awk test.inp >/dev/null real 1.38 user 1.17 sys 0.11 $ time -p env LC_ALL=C GAWK_NO_DFA=1 ./gawk -f test.awk test.inp >/dev/null real 0.51 user 0.43 sys 0.07 -------- This problem seems to be due to dfa.c. In fact, grep is also slower than not to use dfa in the same cases. -------- $ env LC_ALL=C src/grep --version grep (GNU grep) 3.1 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>. $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null real 1.06 user 0.97 sys 0.08 -------- # grep does not have control by GREP_NO_DFA environment, so I changed # as following and rebuilded, and run the same test as above. ======== diff --git a/lib/dfa.c b/lib/dfa.c index 2b9c80e..f234b88 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -3474,21 +3474,7 @@ dfacomp (char const *s, size_t len, struct dfa *d, bool searchflag) dfaparse (s, len, d); dfassbuild (d); - if (dfa_supported (d)) - { - dfaoptimize (d); - dfaanalyze (d, searchflag); - } - else - { - d->dfaexec = dfaexec_noop; - } - - if (d->superset) - { - d->fast = true; - dfaanalyze (d->superset, searchflag); - } + d->dfaexec = dfaexec_noop; } /* Free the storage held by the components of a dfa. */ ======== -------- $ time -p env LC_ALL=C src/grep -X gawk -f test.pat test.inp >/dev/null real 0.43 user 0.37 sys 0.05 -------- However, I do not still know why dfa is slower than regex in this case. Thanks, Norihiro