On Wed, Jun 8, 2011 at 11:22 AM, G00gle and Python Lover <pythech...@gmail.com> wrote: > Hello. > I almost like everything in Python. Code shrinking, logic of processes, > libraries, code design etc. > But, we... - everybody knows that Python 2.x has lack of unicode support. > In Python 3.x, this has been fixed :) And I like 3.x more than 2.x > But, still major applications haven't been ported to 3.x like Django. > Is there a way to make 2.x behave like 3.x in unicode support? > Is it possible to use Unicode instead of Ascii or remove ascii? > Python with ascii sucks :S > I know: >> >> >>> lackOfUnicodeSupportAnnoys = u'Yeah I finally made it! Should be a >> >>> magical thing! Unmögötich! İnanılmaz! Süper...' >> >> >>> print lackOfUnicodeSupportAnnoys >> >> Yeah I finally made it! Should be a magical thing! Unmögötich! Ýnanýlmaz! >> Süper... >> >> >>> # You see the Turkish characters are not fully supported... >> >> >>> print str(lackOfUnicodeSupportAnnoys) >> >> Traceback (most recent call last): >> >> File "<pyshell#7>", line 1, in <module> >> >> print str(lackOfUnicodeSupportAnnoys) >> >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in >> position 54: ordinal not in range(128) >> >> >>> # Encode decode really sucks... >> >> >>> lackOfUnicodeSupportAnnoys = 'Yeah I finally made it! Should be a >> >>> magical thing! Unmögötich! İnanılmaz! Süper...' >> >> >>> # Look that I didn't use 'u' >> >> >>> print lackOfUnicodeSupportAnnoys >> >> Yeah I finally made it! Should be a magical thing! Unmögötich! İnanılmaz! >> Süper... >> >> >>> # This time it worked, strange... >> >> >>> lackOfUnicodeSupportAnnoys = unicode('Yeah I finally made it! Should >> >>> be a magical thing! Unmögötich! İnanılmaz! Süper...') >> >> Traceback (most recent call last): >> >> File "<pyshell#10>", line 1, in <module> >> >> lackOfUnicodeSupportAnnoys = unicode('Yeah I finally made it! Should be a >> magical thing! Unmögötich! İnanılmaz! Süper...') >> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 54: >> ordinal not in range(128) >> >> >>> # Some annoying error again >> >> >>> lackOfUnicodeSupportAnnoys >> >> 'Yeah I finally made it! Should be a magical thing! Unm\xf6g\xf6tich! >> \xddnan\xfdlmaz! S\xfcper...' >> >> >>> # And finally, most annoying thing isn't it? > > Thanks...
I think you're misunderstanding what Unicode support means. Python 2 does have unicode support, but it doesn't do Unicode by default. And a lack of Unicode by default does not mean ASCII either. There are two ways of looking at strings: as a sequence of bytes and as a sequence of characters. In python 2, a sequence of bytes is declared by "" and a sequence of characters is declared u"". In Python 3, a sequence of bytes is declared as b"" and a sequence of characters is declared "". An encoding is a function that maps bytes to characters. The only time it matters is when you are trying to convert from bytes to characters. This is needed because you can't send characters out over a socket or write them to a file- you can only send bytes. When you want to convert from bytes to characters or vice versa, you need to specify an encoding. So instead of doing str(foo), you should do foo.encode(charset), where charset is the encoding that you need to use in your output. Python will try to figure out the encoding your terminal uses if it can, but if it can't, it will fall back to ASCII (the lowest common denominator) rather than guess. That behavior has not changed between Python 2 and Python 3 (except that Python is more aggressive in its attempts to figure out the console encoding). The reason your first example didn't work is because Python defaulted to using one encoding to interpret the bytes when you declared the string as Unicode (perhaps a Western Eurpean encoding) and that encoding was different than the encoding your terminal uses. In a Python script, you can fix that by declaring the encoding of the source file using one of the methods specified in PEP 263 (implemented in Python 2.3). The second example worked because there was no conversion- you gave Python a sequence of bytes and it outputted that sequence of bytes. Since your source and destination have the same encoding, it happens to work out. Your last example does show something that has changed as a result of the Unicode switch. In Python 2, the repr() of a string was intentionally shown as ASCII with the escape sequences for non-ASCII characters to help people on terminals that didn't support the full Unicode character set. Since the default type of string is Unicode in Python 3, that's been switched to show the characters unless you explicity encode the string using "string-escape". The only other major thing that Python 3 added in addition to Unicode being the default is that you can have non-ASCII variable names in your source code. > -- > http://mail.python.org/mailman/listinfo/python-list > > -- http://mail.python.org/mailman/listinfo/python-list