On 3/7/2012 6:18 PM, Ben Finney wrote:
Steven D'Aprano<steve+comp.lang.pyt...@pearwood.info>  writes:

On Thu, 08 Mar 2012 08:48:58 +1100, Ben Finney wrote:
I think that's a Python bug. If the latter succeeds as a no-op, the
former should also succeed as a no-op. Neither should ever get any
errors when ‘s’ is a ‘unicode’ object already.

No. The semantics of the unicode function (technically: a type
constructor) are well-defined, and there are two distinct behaviours:

   Right. The real problem is that Python 2.7 doesn't have distinct
"str" and "bytes" types.  type(bytes() returns <type 'str'>
"str" is assumed to be ASCII 0..127, but that's not enforced.
"bytes" and "str" should have been distinct types, but
that would have broken much old code.  If they were distinct, then
constructors could distinguish between string type conversion
(which requires no encoding information) and byte stream decoding.

   So it's possible to get junk characters in a "str", and they
won't convert to Unicode.  I've had this happen with databases which
were supposed to be ASCII, but occasionally a non-ASCII character
would slip through.

   This is all different in Python 3.x, where "str" is Unicode and
"bytes" really are a distinct type.

                                John Nagle
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to