On Sun, 07 Feb 2010 00:26:36 +0000, Steven D'Aprano wrote: >> So there isn't such a routine just because some of the regular >> expressions cannot be enumerated.
No. There isn't a routine because no-one has yet felt any need to write one. >> However, some of them can be >> enumerated. I guess I have to write a function myself. > > How do you expect to tell the ones that can be enumerated apart from > those that can't be? The ones which use the '+', '*' and '{m,}' operators match an infinite number of strings; the rest can only match a finite number (assuming POSIX REs; Python also has +? and *?). ["Enumerate" isn't the correct word here. You can *enumerate* an infinite set, in the sense that you could write a Python generator for which any member will eventually be generated.] The obvious implementation is to construct the NFA then "run" it. If you know that the RE can only match finite strings (i.e. the graph is acyclic), then you can enumerate them using depth-first traversal. If it can match infinite strings (i.e. if there are any cycles in the graph), then you would need to use either breadth-first traversal or incrementally-bounded depth-first traversal. > [Aside: Python regexes aren't Turing Complete. I'm not sure about Perl > regexes. Either way, this might actually be less difficult than the > Halting Problem, as in "amazingly difficult" rather than "impossible".] "Regular expressions" aren't Turing complete; this is implicit in the definition of "regular". The Chomsky hierarchy has four levels, with higher levels require a more capable system to decide whether a string is a member of the language defined by the grammar: grammar decidable by regular finite automaton context-free pushdown automaton[1] context-sensitive linear-bounded automaton[2] recursively-enumerable Turing machine However, any "regular expression" syntax which allows backreferences (including the POSIX specification) isn't actually "regular" in the formal sense (as it requires an infinite number of states), but context-free. [1] pushdown automaton = finite automaton with a stack [2] linear-bounded automaton = Turing machine, except that it's "tape" is finite and proportional to the size of the input. http://en.wikipedia.org/wiki/Chomsky_hierarchy -- http://mail.python.org/mailman/listinfo/python-list