Dan Bishop wrote: > On Apr 12, 9:29 am, Carl Banks <[EMAIL PROTECTED]> wrote: >> On Apr 12, 10:06 am, Kay Schluehr <[EMAIL PROTECTED]> wrote: >> >>> On 12 Apr., 14:44, Christian Heimes <[EMAIL PROTECTED]> wrote: >>>> Gabriel Genellina schrieb: >>>>> On the last line, str(x), I would expect 'abc' - same as str(x, 'ascii') >>>>> above. But I get the same as repr(x) - is this on purpose? >>>> Yes, it's on purpose but it's a bug in your application to call str() on >>>> a bytes object or to compare bytes and unicode directly. Several months >>>> ago I added a bytes warning option to Python. Start Python as "python >>>> -bb" and try it again. ;) >>> And making an utf-8 encoding default is not possible without writing a >>> new function? >> I believe the Zen in effect here is, "In the face of ambiguity, refuse >> the temptation to guess." How do you know if the bytes are utf-8 >> encoded? > > True, you can't KNOW that. Maybe the author of those bytes actually > MEANT to say '¿Cómo estás?' instead of '¿Cómo estás?'. However, > it's statistically unlikely for a non-UTF-8-encoded string to just > happen to be valid UTF-8.
So you propose to perform a statistical analysis on your input to determine whether it's UTF-8 or some other encoding? regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ -- http://mail.python.org/mailman/listinfo/python-list