Sergey Poznyakoff wrote: > By the way, I am also experimenting with the idea of re-implementing > the exclude module using DFA, i.e. regarding the entire exclude list > as a set of regular language definitions and creating a DFA for each > of them (it is a *set* of definitions, because its parts can have > different EXCLUDE_* flags, so they should be processed differently). > This may constitute a significant speed-up.
Why does it not fit into two regexes / DFAs? I would think that - The EXCLUDE_INCLUDE flag is a negation on the result, so we can handle the "excluded" condition in one regex and the "included" condition in another regex. - The two other flags EXCLUDE_ANCHORED, EXCLUDE_WILDCARDS only affect the transformation from fnmatch pattern to regex: EXCLUDE_WILDCARDS on: 'a?b*' -> 'a.b.*' EXCLUDE_WILDCARDS off: 'a?b*' -> 'a[?]b[*]' EXCLUDE_ANCHORED on: 'a?b' -> '^a.b$' EXCLUDE_ANCHORED off: 'a?b' -> '^\([^/]*/\)*a.b$' - After the transformation from fnmatch pattern to regex, you apply alternation: \(regex1\|regex2\|...\) > using DFA The DFA code from grep and gawk is next to unmaintainable [1]. Aharon Robbins agrees that it would be good to rewrite it. Bruno [1] http://lists.gnu.org/archive/html/bug-gnulib/2009-07/msg00001.html