On Thu, Sep 12, 2013 at 10:25 AM, Mark Janssen <dreamingforw...@gmail.com> wrote: >>> On Tue, 10 Sep 2013, Ben Finney wrote: >>> > The sooner we replace the erroneous >>> > “text is ASCII” in the common wisdom with “text is Unicode”, the >>> > better. >>> >>> I'd actually argue that it's better to replace the common wisdom with >>> "text is binary data, and we should normally look at that text through >>> Unicode eyes". A little less catchy, but more accurate ;) >> >> No, that's inaccurate. A sequence of bytes is binary data. Unicode is >> not binary data. > > Well now, this is an area that is not actually well-defined. I would > say 16-bit Unicode is binary data if you're encoding in base 65,536, > just as 8-bit ascii is binary data if you're encoding in base-256. > Which is to say: there is no intervening data to suggest a TYPE.
Unicode is not 16-bit any more than ASCII is 8-bit. And you used the word "encod[e]", which is the standard way to turn Unicode into bytes anyway. No, a Unicode string is a series of codepoints - it's most similar to a list of ints than to a stream of bytes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list