On Saturday, November 22, 2014 8:14:15 PM UTC+5:30, Roy Smith wrote: > Marko Rauhamaa wrote: > > > Steven D'Aprano: > > > > > You haven't given any good reason for objecting to calling Unicode > > > strings by what they are. Maybe you think that it is an implementation > > > detail, and that some version of Python might suddenly and without > > > warning change to only supporting KOI8-R strings or GB2312 strings? If > > > so, you are badly mistaken. The fact that Python strings are Unicode > > > is not an implementation detail, it is part of the language semantics. > > > > To me, repeating the word Unicode everywhere is giving the (in and of > > itself impressive) standard too primary a status. While understanding > > how Unicode, IEEE-754, 2's complement, mark-and-sweep etc work is very > > useful and occasionally can be taken explicit advantage of, those really > > are mundane techniques to implement abstractions. > > > > Python's strings exist (primarily) so you can express utterances in a > > human language, aka plain text. They don't exist to express Unicode code > > points. That would be putting the cart before the horse. > > > > > "Rectangular door" makes perfect sense, and in a world where there are > > > dozens of legacy non-rectangular doors, it would be very sensible to > > > specify the kind of door. > > > > It makes sense, and yet, I've never heard anyone talk about rectangular > > doors even though I use numerous doors every day. Why is it, then, that > > people feel the constant need to add the "Unicode" epithet to Python's > > strings, which -- according to its own specification -- are just > > strings? > > > > > > Marko > > There's a old joke to the effect that the fields of study which are > confident that they're really doing science (i.e. chemistry, biology, > physics, astronomy, etc) don't put the word "science" in their names. > It's only the fields of study that are less confident about their status > as sciences (computer science, behavioral science, political science, > etc) that feel the need to explicitly say "science". As if repeating it > enough times makes it true. I wonder if something of the same thing > applies here? <ducking and running> > > Somewhat more seriously, the IEEE-754 point is quite apropos. Back when > 754 first came out, there were lots of different floating point > implementations. Machines that used 754 touted it in their sales > literature and mentioned it all over their documentation. These days, > 754 is so ubiquitous, nobody even thinks to mention it, in the same way > nobody bothers to mention 2's complement integers. I suspect that some > day, the same thing will happen with Unicode. For that matter, we will > eventually get to the point where when people say, "just plain text", > they will mean Unicode, in the same way that "just plain text" today > really means ASCII (and the text/plain MIME type will become a > historical curiosity).
Yes this was my point also -- encodings in general and unicode in particular is a mess (as of 2014). Maybe in a few years the dust will settle. Then saying 'unicode' will become redundant. But until then when we have a rather leaky abstraction having sealing liquid on the hands is preferable to sewage in the house. -- https://mail.python.org/mailman/listinfo/python-list