En Wed, 31 Mar 2010 12:23:48 -0300, Paul McGuire <pt...@austin.rr.com>
escribió:
On Mar 31, 5:49 am, Nathan Harmston <iwanttobeabad...@googlemail.com>
wrote:

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match (to compare
performance of regex against an acora based matcher).

The pyparsing wiki Examples page includes this regex inverter:
http://pyparsing.wikispaces.com/file/view/invRegex.py

From the module header:
# Supports:
# - {n} and {m,n} repetition, but not unbounded + or * repetition
# - ? optional elements
# - [] character ranges
# - () grouping
# - | alternation

I took the liberty of modifying your invRegex.py example, adding support
for infinite repeaters. It depends on two other modules:

mergeinf.py (from http://code.activestate.com/recipes/577041) provides the
infinite merge operation.

enumre.py provides the basic functions (merge, prod, repeat, closure)
necessary to enumerate the language generated by a given regular
expression, even if it contains unbounded repeaters like *,+.  The key is
to generate shorter strings before longer ones, so in 'a*|b*' it doesn't
get stuck generating infinite a's before any b.

By example, "(a|bc)*d" corresponds to this code:

      prod(
        closure(
          merge(
            'a',
             prod('b','c'))),
        'd')

which returns an infinite generator starting with:

d
ad
aad
bcd
aaad
abcd
bcad
aaaad
aabcd
abcad
bcaad
bcbcd
aaaaad
aaabcd
aabcad
...


I got the idea from
http://userweb.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf

Finally, invRegexInf.py is based on your original regex parser. I only
modified the generation part, taking advantage of the above
infrastructure; the parser itself remains almost the same. It essentially
saves oneself the very tedious work of converting a regular expression
into the equivalent sequence of function calls as shown above. (I hope I
got it right: I like pyparsing a lot and use it whenever I feel it's
appropriate, but not as often as to remember the details...)

--
Gabriel Genellina

Attachment: invRegexInf.py
Description: Binary data

Attachment: enumre.py
Description: Binary data

Attachment: mergeinf.py
Description: Binary data

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to