Diez B. Roggisch <[EMAIL PROTECTED]> wrote: > Please write the following program and meditate at least 30min in front of > it:
> while True: > print "utf-8 is not unicode" I hope you will have a better day today than yesterday ! Now, I did this: while True: print "¡ Python knows about encoding, but only sometimes !" My terminal is setup in UTF-8, and... It did print correctly. I expected that by setting coding: utf-8, all the I/O functions would do the encoding for me, because if they don't then I, and everybody who writes a script, will need to subclass every single I/O class (ok, except for print !). > Bytestrings are just that - a sequence of 8-bit-values. It used to be that int were 8 bits, we did not stay stuck in time and int are now typically longer. I expect a high level language to let me set the encoding once, and do simple I/O operation... without having encode/decode. > Now the real world of databases, network-connections and harddrives doesn't > know about unicode. They only know bytes. So before you can write to them, > you need to "encode" the unicode data to a byte-stream-representation. > There are quite a few of these, e.g. latin1, or the aforementioned UTF-8, > which has the property that it can render *all* unicode characters, > potentially needing more than one byte per character. Sure if I write assembly, I'll make sure I get my bits, bytes, multi-bytes chars right, but that's why I use a high level language. Here's an example of an implementation that let you write Unicode directly to a dbhash, I hoped there would be something similar in python: http://www.oracle.com/technology/documentation/berkeley-db/db/gsg/JAVA/DBEntry.html > db = dbhash.open('dbfile.db') > smiley = db[u'smiley'.encode('utf-8')].decode('utf-8') > print smiley.encode('utf-8') > The last encode is there to print out the smiley on a terminal - one of > those pesky bytestream-eaters that don't know about unicode. What are you talking about ? I just copied this right from my terminal (LANG=en_CA.utf8): >>> print unichr(0x020ac) € >>> Now, I have read that python 2.6 has better support for Unicode. Does it allow to write to file, bsddb etc... without having to encode/decode every time ? This is a big enough issue for me right now that I will manually install 2.6 if it does. Thanks. -- Yves. http://www.sollers.ca/blog/2008/no_sound_PulseAudio http://www.sollers.ca/blog/2008/PulseAudio_pas_de_son/.fr -- http://mail.python.org/mailman/listinfo/python-list