On Sun, 07 Feb 2010 00:26:36 +0000, Steven D'Aprano wrote:

>> So there isn't such a routine just because some of the regular
>> expressions cannot be enumerated.
No. There isn't a routine because no-one has yet felt any need to write
one.

>> However, some of them can be
>> enumerated. I guess I have to write a function myself.
> 
> How do you expect to tell the ones that can be enumerated apart from 
> those that can't be?

The ones which use the '+', '*' and '{m,}' operators match an infinite
number of strings; the rest can only match a finite number (assuming POSIX
REs; Python also has +? and *?).

["Enumerate" isn't the correct word here. You can *enumerate* an
infinite set, in the sense that you could write a Python generator for
which any member will eventually be generated.]

The obvious implementation is to construct the NFA then "run" it.

If you know that the RE can only match finite strings (i.e. the graph is
acyclic), then you can enumerate them using depth-first traversal. If it
can match infinite strings (i.e. if there are any cycles in the graph),
then you would need to use either breadth-first traversal or
incrementally-bounded depth-first traversal.

> [Aside: Python regexes aren't Turing Complete. I'm not sure about Perl 
> regexes. Either way, this might actually be less difficult than the 
> Halting Problem, as in "amazingly difficult" rather than "impossible".]

"Regular expressions" aren't Turing complete; this is implicit in the
definition of "regular". The Chomsky hierarchy has four levels, with
higher levels require a more capable system to decide whether a string is
a member of the language defined by the grammar:

        grammar                 decidable by

        regular                 finite automaton
        context-free            pushdown automaton[1]
        context-sensitive       linear-bounded automaton[2]
        recursively-enumerable  Turing machine

However, any "regular expression" syntax which allows backreferences
(including the POSIX specification) isn't actually "regular" in the formal
sense (as it requires an infinite number of states), but context-free.

[1] pushdown automaton = finite automaton with a stack

[2] linear-bounded automaton = Turing machine, except that it's "tape" is
finite and proportional to the size of the input.

        http://en.wikipedia.org/wiki/Chomsky_hierarchy

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to