On Sun, 15 Jul 2018 09:07:17 +1000, Chris Angelico wrote: > On Sun, Jul 15, 2018 at 8:15 AM, Marko Rauhamaa <ma...@pacujo.net> > wrote: >> Chris Angelico <ros...@gmail.com>: >> >>> On Sun, Jul 15, 2018 at 5:54 AM, Marko Rauhamaa <ma...@pacujo.net> >>> wrote: >>>> True enough. Modern-day protocols as well as Linux file formats and >>>> commands intentionally blur the line between strings and bytes. The >>>> software in question deals with all of the above. It is virtually >>>> impossible to keep track of what is "really" text and what is >>>> "really" binary.
Of course we have no idea what Marko's software is, or what it is doing, but frankly that seems pretty implausible to me. On the face of it, it seems as ridiculous as the claim that he can't tell which variables are quote-unquote "really" lists of weights and which are lists of distances. On the face of things, this really sounds more like an admission that Marko is working with a shitty code base, not a fundamental problem with Python. But dealing with shitty code bases is the reality. >>>> In the end, the Gordian Knot was sliced by using >>>> Python3's strings for everything and restricting oneself to Latin-1 >>>> codepoints (almost) everywhere. [...] I wonder whether Marko's Python 2.7 code base was ever actually tested with non-Latin1 text. I suspect that if Marko had (let's say) Japanese users expecting to use CJK characters in the application, his affection for the 2.7 version would be a lot less. [Marko] >> What I'm saying is that I'm using Python3 >> strings as holders for bytes. Since every byte is a valid Unicode code >> point, a Python3 string can hold any sequence of bytes. [Chris] > Since every byte is also a valid IEEE 754 64-bit binary floating point > value, a sequence of floats can hold any sequence of bytes, too. Is it a > good idea to use floats to represent bytes? 3.6e-322 1.6e-322 4.8e-322 5.1e-322 5.63e-322 5e-322 5e-322 1.63e-322 > Text strings and sequences of bytes *are different*. At an implementation level, everything is bytes. People do so insist on conflating implementation with interface, even when they don't need to... (Sometimes I think people should be required to implement algorithms on analogue computing devices before they're allowed to write code for digital computers, just to drive home the point that neither bytes nor bits are fundamental to computing, but are mere implementation details.) At a semantic level, byte strings and text strings represent fundamentally different things, as distinct as weights and lengths. Unfortunately, due to the long influence of ASCII in computing, a lot of people have internalised that "byte 0x41 *really is* the letter A" when that's just a mere encoding convention. You wouldn't add 5kg to 5cm and expect to get a meaningful result, but people expect to combine bytes and text and "just make it work". One might as well say that bytes b'@=<\xed\x91hr\xb0' really is the number 29.238 and expect to multiple your name by 12.5 and get your height in seconds. [Marko] >> Couldn't you use bytes objects everywhere for the same purpose? >> >> Yes and no. >> >> Yes, but it would be ugly as hell and would involve changing a large >> percentage of the source code. It would also require re-inventing the entire Unicode infrastructure already provided -- unless you intended to just say No to 99% of human languages in the world, including English, in favour of restricting everyone, including English speakers, to an artificial subset of the characters they use in real life. (Even Latin1 doesn't cover all the English punctuation marks I expect to be able to use in text.) It's not 1970 any more. Under what circumstances is that acceptable? >> No, as a large number of Python3 facilities require str objects as >> arguments. Consider urllib.request.urlopen(), for example, which >> requires a URL to be an str object. That's because URLs are fundamentally text strings. Quick quiz: which of the following are real URLs? (a) http://правительство.рф (b) http://παράδειγμα.δοκιμή (c) http://실례.테스트 (d) All of the above. https://uxmag.com/articles/a-url-in-any-language > Well, duh. It also doesn't accept a list of floats, just because you > COULD represent a text string that way. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list