MRAB:
Implementing the regex module (http://pypi.python.org/pypi/regex) would have been more difficult if the internal representation had been UTF-8, because of the need to decode, and the implementation would also have been slower for that reason.
One way to build regex support for UTF-8 is to build a fixed width version of the regex code and then interpose an object that converts between the UTF-8 representation and that code.
The C++11 standard library contains a regex template that can be instantiated over a UTF-8 representation in this way.
Neil -- http://mail.python.org/mailman/listinfo/python-list