MRAB:

Implementing the regex module (http://pypi.python.org/pypi/regex) would
have been more difficult if the internal representation had been UTF-8,
because of the need to decode, and the implementation would also have
been slower for that reason.

One way to build regex support for UTF-8 is to build a fixed width version of the regex code and then interpose an object that converts between the UTF-8 representation and that code.

The C++11 standard library contains a regex template that can be instantiated over a UTF-8 representation in this way.

   Neil

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to