>> Just realize that once you start using 'ignore' you're going to also >> ignore discrepancies that are real. For example, maybe your terminal is >> actual something other than either latin-1 or utf-8. > > If you need to see such discrepancies, you can do > > print src.decode("utf-8").encode("latin-1", ""xmlcharrefreplace") > > > That would produce something like: > > processeurs Intel® Core™ de 3ème génération av > > that is, the problem characters are displayed in &#...; notation. > That is ugly, but sometimes it's the only way to see what character > you really have. > > Notice that the number you get is in decimal, where the \u.... > notation uses hex:
Thanks guys my issue is now solved - the problem came from my Putty client, it was on latin1 by default and changing it to utf-8, now works... -- http://mail.python.org/mailman/listinfo/python-list