RFC: extend "study" to produce a fast grep through many regexes

David L. Nicol Mon, 21 Aug 2000 14:39:07 -0700



title: study a list of regexes
David Nicol.
Aug 21
version 1
[EMAIL PROTECTED]


Sometimes I have a group of regexen, and I would like to know
which ones will match.

Current practice is to "study" $situation and then grep them:


example a:

        study $situation;
        @matches = @rules{ grep {$situation =~ m/$_/} keys %rules};


What if there were a faster way to do this, a way to C<study> a
group of regexes and have the computer determine which ones had
parts in common, so that if $situation =~ m/^foo/ is true, the
fifty rules that start ^bar don't waste any time.  At all.

example b:

        $matchcode = study @regexen;

will generate an anonymous subroutine (it's called $matchcode in
the example line) with a tree based on required parts of the
regexes, to minimize the number of match attempts to determine
which regexes will match.  The subroutine will take an array
argument and will return the subset of the rules (as stated in
the original array, either string or compiled qr// references)
that match on all the arguments.

The code in example a could then be replaced

        $matchcode = study keys %rules;
        @matches = @rules{ &$matchcode $situation }


This ability could speed "dirty matching" which currently cannot
take advantage of constant-time hash lookups without standardizing
the dirty parts.




-- 
                          David Nicol 816.235.1187 [EMAIL PROTECTED]
                      Does despair.com sell a discordian calendar?

RFC: extend "study" to produce a fast grep through many regexes

Reply via email to