Arnaud Delobelle writes: >Hallvard B Furuseth <h.b.furus...@usit.uio.no> writes: >> I've been playing a bit with Python3.2a2, and frankly its charset >> handling looks _less_ safe than in Python 2. >> (...) >> With 2.<late> conversion Unicode <-> string the equivalent operation did >> not silently produce garbage: it raised UnicodeError instead. With old >> raw Python strings that was not a problem in applications which did not >> need to convert any charsets, with python3 they can break. >> >> I really wish bytes.__str__ would at least by default fail. > > I think you misunderstand the purpose of str(). It is to provide a > (unicode) string representation of an object and has nothing to do with > converting it to unicode:
That's not the point - the point is that for 2.* code which _uses_ str vs unicode, the equivalent 3.* code uses str vs bytes. Yet not the same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str. So upgraded old code will have to expect both str and bytes. In 2.*, str<->unicode conversion failed or produced the equivalent character/byte data. Yes, there could be charset problems if the defaults were set up wrong, but that's a smaller problem than in 3.*. In 3.*, the bytes->str conversion always _silently_ produces garbage. And lots of code use both, and need to convert back and forth. In particular code 3.* code converted from 2.*, or using modules converted from 2.*. There's a lot of such code, and will be for a long time. -- Hallvard -- http://mail.python.org/mailman/listinfo/python-list