Ron Garret wrote: > > But what about this: > > >>> f2=open('foo','w') > >>> f2.write(u'\xFF') > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in > position 0: ordinal not in range(128) > >>> > > That should have nothing to do with my terminal, right?
Correct. But first try to answer this: given that you want to write the Unicode character value 255 to a file, how is that character to be represented in the file? For example, one might think that one could just get a byte whose value is 255 and write that to a file, but what happens if one chooses a Unicode character whose value is greater than 255? One could use two bytes or three bytes or as many as one needs, but what if the lowest 8 bits of that value are all set? How would one know, if one reads a file back and gets a byte whose value is 255 whether it represents a character all by itself or is part of another character's representation? It gets complicated! The solution is that you choose an encoding which allows you to store the characters in the file, thus answering indirectly the question above: encodings determine how the characters are represented in the file and allow you to read the file and get back the characters you put into it. One of the most common encodings suitable for the storage of Unicode character values is UTF-8, which has been designed with the above complications in mind, but as long as you remember to choose an encoding, you don't have to think about it: Python takes care of the difficult stuff on your behalf. In the above code you haven't made that choice. So, to answer the above question, you can either... * Use the encode method on Unicode objects to turn them into plain strings, then write them to a file - at that point, you are writing specific byte values. * Use the codecs.open function and other codecs module features to write Unicode objects directly to files and streams - here, the module's infrastructure deals with byte-level issues. * If you're using something like an XML library, you can often pass a normal file or stream object to some function or method whilst stating the output encoding. There is no universally correct answer to which encoding should be used when writing Unicode character values to files, contrary to some beliefs and opinions which, for example, lead to people pretending that everything is in UTF-8 in order to appease legacy applications with the minimum of tweaks necessary to stop them from breaking completely. Thus, Python doesn't make a decision for you here. Paul -- http://mail.python.org/mailman/listinfo/python-list