Sergey Poznyakoff wrote:
> By the way, I am also experimenting with the idea of re-implementing
> the exclude module using DFA, i.e. regarding the entire exclude list
> as a set of regular language definitions and creating a DFA for each
> of them (it is a *set* of definitions, because its parts can have
> different EXCLUDE_* flags, so they should be processed differently).
> This may constitute a significant speed-up. 

Why does it not fit into two regexes / DFAs? I would think that
  - The EXCLUDE_INCLUDE flag is a negation on the result, so we can
    handle the "excluded" condition in one regex and the "included"
    condition in another regex.
  - The two other flags EXCLUDE_ANCHORED, EXCLUDE_WILDCARDS only
    affect the transformation from fnmatch pattern to regex:
      EXCLUDE_WILDCARDS on:  'a?b*' -> 'a.b.*'
      EXCLUDE_WILDCARDS off: 'a?b*' -> 'a[?]b[*]'
      EXCLUDE_ANCHORED on:  'a?b' -> '^a.b$'
      EXCLUDE_ANCHORED off: 'a?b' -> '^\([^/]*/\)*a.b$'
  - After the transformation from fnmatch pattern to regex, you
    apply alternation: \(regex1\|regex2\|...\)

> using DFA

The DFA code from grep and gawk is next to unmaintainable [1]. Aharon Robbins
agrees that it would be good to rewrite it.

Bruno

[1] http://lists.gnu.org/archive/html/bug-gnulib/2009-07/msg00001.html


Reply via email to