Nathan Harmston wrote:
I have a slightly complicated/medium sized regular expression
and I want to generate all possible words that it can match
(to compare performance of regex against an acora based
matcher). Using the regular expression as a grammar to
generate all words in its language. I was wondering if this
possible in Python or possible using anything. Google doesnt
seem to give any obvious answers.
Unless you limit your regexp to bounded regular expressions (you
don't use the "*", "+" and unbounded-top-end "{...}" tokens), it
has an infinite number of possible words that can match. As a
simple example, the regexp "a*" matches "" (the empty string),
"a", "aa", "aaa", "aaaa"...and so on up to the limit of a string
length.
There was a discussion on this a while back:
http://mail.python.org/pipermail/python-list/2010-February/1235120.html
so you might chat with the person who started that thread. You
basically have to write a regexp parser that generates code to
generate the expressions, and even simple expressions such as
"[a-z]{1,30}" can create an astronomical number of possible inputs.
-tkc
--
http://mail.python.org/mailman/listinfo/python-list