Nathan Harmston wrote:
I have a slightly complicated/medium sized regular expression
and I want to generate all possible words that it can match
(to compare performance of regex against an acora based
matcher). Using the regular expression as a grammar to
generate all words in its language. I was wondering if this
possible in Python or possible using anything. Google doesnt
seem to give any obvious answers.

Unless you limit your regexp to bounded regular expressions (you don't use the "*", "+" and unbounded-top-end "{...}" tokens), it has an infinite number of possible words that can match. As a simple example, the regexp "a*" matches "" (the empty string), "a", "aa", "aaa", "aaaa"...and so on up to the limit of a string length.

There was a discussion on this a while back:

http://mail.python.org/pipermail/python-list/2010-February/1235120.html

so you might chat with the person who started that thread. You basically have to write a regexp parser that generates code to generate the expressions, and even simple expressions such as "[a-z]{1,30}" can create an astronomical number of possible inputs.

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to