En Wed, 31 Mar 2010 07:49:14 -0300, Nathan Harmston <iwanttobeabad...@googlemail.com> escribió:

I have a slightly complicated/medium sized regular expression and I
want to generate all possible words that it can match (to compare
performance of regex against an acora based matcher). Using the
regular expression as a grammar to generate all words in its language.
I was wondering if this possible in Python or possible using anything.
Google doesnt seem to give any obvious answers.

I've done this some time ago.
This recipe http://code.activestate.com/recipes/577041-merge-multiple-potentially-infinite-sorted-inputs-/ provides an infinite merge operation, required for effectively enumerating a regular language (with shorter strings first, else 'a*b' would get stuck repeating "a"s and never yielding any "b"). The example at the end shows the basics of how to use it (by hand) in a very simple case. To enumerate all strings matching '(a|bc)*' one should invoke closure(product(['a'],['b','c'])), which gives:

'', 'a', 'aa', 'bc',
'aaa', 'abc', 'bca',
'aaaa', 'aabc', 'abca', 'bcaa', 'bcbc',
'aaaaa', 'aaabc', 'aabca', 'abcaa', 'abcbc', 'bcaaa', 'bcabc', 'bcbca',
...

Converting the former expression into the later function calls requires a parser (not shown -- and I won't be able to find it until Friday)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to