On Fri, 17 Jan 2014 12:12:35 +0000, Robin Becker wrote: > On 17/01/2014 11:41, Steven D'Aprano wrote: >> def func(a): >> """ >> >>> print(func(u'aaa')) >> aaa >> """ >> return a > > I think this approach seems to work if I turn the docstring into unicode > > def func(a): > u""" > >>> print(func(u'aaa\u020b')) > aaa\u020b > """ > return a
Good catch! Without the u-prefix, the \u... is not interpreted as an escape sequence, but as a literal backslash-u. > If I leave the u off the docstring it goes wrong in python 2.7. I also > tried to put an encoding onto the file and use the actual utf8 > characters ie > > # -*- coding: utf-8 -*- > def func(a): > """ > >>> print(func(u'aaa\u020b')) > aaa╚ï > """ > return a There seems to be some mojibake in your post, which confuses issues. You refer to \u020b, which is LATIN SMALL LETTER I WITH INVERTED BREVE. At least, that's what it ought to be. But in your post, it shows up as the two character mojibake, ╚ followed by ï (BOX DRAWINGS DOUBLE UP AND RIGHT followed by LATIN SMALL LETTER I WITH DIAERESIS). It appears that your posting software somehow got confused and inserted the two characters which you would have got using cp-437 while claiming that they are UTF-8. (Your post is correctly labelled as UTF-8.) I'm confident that the problem isn't with my newsreader, Pan, because it is pretty damn good at getting encodings right, but also because your post shows the same mojibake in the email archive: https://mail.python.org/pipermail/python-list/2014-January/664771.html To clarify: you tried to show \u020B as a literal. As a literal, it ought to be the single character ȋ which is a lower case I with curved accent on top. The UTF-8 of that character is b'\xc8\x8b', which in the cp-437 code page is two characters ╚ ï. py> '\u020b'.encode('utf8').decode('cp437') '╚ï' Hence, mojibake. > def _doctest(): > import doctest > doctest.testmod() > > and that works in python3, but fails in python 2 with this >> (py27) C:\code\hg-repos>python tdt1.py C:\python\Lib\doctest.py:1531: >> UnicodeWarning: Unicode equal comparison failed to convert both >> arguments to Unicode - in terpreting them as being unequal >> if got == want: >> C:\python\Lib\doctest.py:1551: UnicodeWarning: Unicode equal comparison >> failed to convert both arguments to Unicode - in terpreting them as >> being unequal I cannot replicate this specific exception. I think it may be a side- effect of you being on Windows. (I'm on Linux, and everything is UTF-8.) >> if got == want: >> ********************************************************************** >> File "tdt1.py", line 4, in __main__.func Failed example: >> print(func(u'aaa\u020b')) >> Expected: >> aaa╚ï >> Got: >> aaa╚ï The difficulty here is that it is damn near impossible to sort out which, if any, bits are mojibake inserted by your posting software, which by your editor, your terminal, which by Python, and which are artifacts of the doctest system. The usual way to debug these sorts of errors is to stick a call to repr() just before the print. print(repr(func(u'aaa\u020b'))) -- Steven -- https://mail.python.org/mailman/listinfo/python-list