On 05/30/2012 10:46 AM, Terry Reedy wrote: > On 5/30/2012 2:52 AM, ru...@yahoo.com wrote: >> In python2, "\u" escapes are processed in raw unicode >> strings. That is, ur'\u3000' is a string of length 1 >> consisting of the IDEOGRAPHIC SPACE unicode character. > > That surprised me until I rechecked the fine manual and found: > > "When an 'r' or 'R' prefix is present, a character following a backslash > is included in the string without change, and all backslashes are left > in the string." > > "When an 'r' or 'R' prefix is used in conjunction with a 'u' or 'U' > prefix, then the \uXXXX and \UXXXXXXXX escape sequences are processed > while all other backslashes are left in the string." > > When 'u' was removed in Python 3, a choice had to be made and the first > must have seemed to be the obvious one, or perhaps the automatic one. > > In 3.3, 'u' is being restored. I have inquired on pydev list whether the > difference above should also be restored, and mentioned this thread.
As mentioned is a different message, another option might be to leave raw strings as is (more consistent since all backslashes are treated the same) and have the "re" module un-escape "\uxxxx" (and similar) literals in regex string (also more consistent since that's what it does with '\\n', '\\t', etc.) I do realize though that this may have back-compatibilty problems that makes it impossible to do. -- http://mail.python.org/mailman/listinfo/python-list