En Wed, 31 Mar 2010 12:23:48 -0300, Paul McGuire <pt...@austin.rr.com> escribió:
On Mar 31, 5:49 am, Nathan Harmston <iwanttobeabad...@googlemail.com> wrote:I have a slightly complicated/medium sized regular expression and I want to generate all possible words that it can match (to compare performance of regex against an acora based matcher).The pyparsing wiki Examples page includes this regex inverter: http://pyparsing.wikispaces.com/file/view/invRegex.pyFrom the module header:# Supports: # - {n} and {m,n} repetition, but not unbounded + or * repetition # - ? optional elements # - [] character ranges # - () grouping # - | alternation
I took the liberty of modifying your invRegex.py example, adding support for infinite repeaters. It depends on two other modules: mergeinf.py (from http://code.activestate.com/recipes/577041) provides the infinite merge operation. enumre.py provides the basic functions (merge, prod, repeat, closure) necessary to enumerate the language generated by a given regular expression, even if it contains unbounded repeaters like *,+. The key is to generate shorter strings before longer ones, so in 'a*|b*' it doesn't get stuck generating infinite a's before any b. By example, "(a|bc)*d" corresponds to this code: prod( closure( merge( 'a', prod('b','c'))), 'd') which returns an infinite generator starting with: d ad aad bcd aaad abcd bcad aaaad aabcd abcad bcaad bcbcd aaaaad aaabcd aabcad ... I got the idea from http://userweb.cs.utexas.edu/users/misra/Notes.dir/RegExp.pdf Finally, invRegexInf.py is based on your original regex parser. I only modified the generation part, taking advantage of the above infrastructure; the parser itself remains almost the same. It essentially saves oneself the very tedious work of converting a regular expression into the equivalent sequence of function calls as shown above. (I hope I got it right: I like pyparsing a lot and use it whenever I feel it's appropriate, but not as often as to remember the details...) -- Gabriel Genellina
invRegexInf.py
Description: Binary data
enumre.py
Description: Binary data
mergeinf.py
Description: Binary data
-- http://mail.python.org/mailman/listinfo/python-list