On Sun, Dec 09, 2018 at 09:23:59AM +1100, Cameron Simpson wrote: > On 07Dec2018 21:20, Steven D'Aprano <st...@pearwood.info> wrote:
# Python 2 > >>>>txt = "abcπ" > > > >but it is a lie, because what we get isn't the string we typed, but the > >interpreters *bad guess* that we actually meant this: > > > >>>>txt > >'abc\xcf\x80' > > Wow. I did not know that! I imagined Python 2 would have simply rejected > such a string (out of range characters -- ordinals >= 256 -- in a "byte" > string). Nope. Python 2 tries hard to make bytes and unicode text work together. If your strings are pure ASCII, it "Just Works" and it seems great but on trickier cases it can lead to really confusing errors. Behind the scenes, what the interpreter is doing is using some platform- specific codec (ASCII, UTF-8, or similar) to automatically encode/decode from bytes to text or vise versa. This sort of "Do What I Mean" processing can work, up to the point that it doesn't, then it all goes pearshaped and you have silent failures and hard-to-diagnose errors. That's why Python 3 takes a hard-line policy that you cannot mix text and bytes (except, possibly, if one is the empty string) except by explicitly converting from one to the other. -- Steve _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor