Sean McIlroy wrote: > I recently found out that unicode("\347", "iso-8859-1") is the > lowercase c-with-cedilla, so I set out to round up the unicode numbers > of the extra characters you need for French, and I found them all just > fine EXCEPT for the o-e ligature (oeuvre, etc). I examined the unicode > characters from 0 to 900 without finding it; then I looked at > www.unicode.org but the numbers I got there (0152 and 0153) didn't > work. Can anybody put a help on me wrt this? (Do I need to give a > different value for the second parameter, maybe?)
Characters that are in iso-8859-1 are mapped directly into Unicode. That is, the first 256 characters of Unicode are identical to iso-8859-1. Consider this: >>> c_cedilla = unicode("\347", "iso-8859-1") >>> c_cedilla u'\xe7' >>> ord(c_cedilla) 231 >>> ord("\347") 231 What you did with c_cedilla "worked" because it was effectively doing nothing. However if you do unicode(char, encoding) where char is not in encoding, it won't "work". As John Lenton has pointed out, if you find a character in the Unicode tables, you can just use it directly. There is no need in this circumstance to use unicode(). HTH, John -- http://mail.python.org/mailman/listinfo/python-list