Ezio Melotti <ezio.melo...@gmail.com> added the comment: As long as you don't mix str and unicode everything works.
With strings: >>> s = '与清新。阿德莱' >>> re.split('。', s) ['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1'] >>> s.split('。') ['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1'] With unicode: >>> u = u'与清新。阿德莱' >>> re.split(u'。', u) [u'\u4e0e\u6e05\u65b0', u'\u963f\u5fb7\u83b1'] >>> u.split(u'。') [u'\u4e0e\u6e05\u65b0', u'\u963f\u5fb7\u83b1'] Mixing str and unicode: >>> re.split(u'。', s) ['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0\xe3\x80\x82\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1'] >>> re.split('。', u) [u'\u4e0e\u6e05\u65b0\u3002\u963f\u5fb7\u83b1'] >>> >>> s.split(u'。') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128) >>> u.split('。') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128) The syntax error is raised for byte literals and can't be backported to 2.7. Raising an error when str and unicode are mixed in re is not backward compatible, and re does work as long as both are ASCII only. I'm therefore closing this as invalid. ---------- nosy: +mrabarnett resolution: -> invalid stage: -> committed/rejected status: open -> closed _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue14068> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com