Re: Cult-like behaviour [was Re: Kindness]

Terry Reedy Sun, 15 Jul 2018 13:10:21 -0700

On 7/15/2018 7:37 AM, Marko Rauhamaa wrote:

One of the classic Unix and Internet tenets is that text is bytes is
text.

Tenets of a faith may be wrong ;-). An informatic paradigm from morethan 45 years ago may be outdated and in need of revision.

On byte storage and on the Internet, **everything** is (encoded) bytes,so saying 'text is bytes' says nothing because it is trivially true. Onthe other hand, 'bytes is text' is wrong unless one uses a characterencoding that assigns a visible character (including <space>) to everybyte. I believe both PCs and Macs had 1 or more such encodings. (I amonly uncertain as to whether b'\x00' was mapped.)

Images are bytes as much as text is. I suggest that 'bytes is image' ismore true than 'bytes is text'. Every byte can be mapped, for instance,into an 8 x 1 or 1 x 8 pixel image after deciding which end gets thehigh and low bits. Bit mapping is likely older than Unix. Bar codesand QR codes are commonplace as international machine-readable images ofbytes.

In a context where 'everything is bytes', then 'bytes is everything' or'bytes can be anything' are the proper reverses.

Of course, much of it was naïve, but UTF-8 has miraculously given

it a new life.

UTF-8 makes 'bytes is text' even less true. Not only are some leadingbytes not text, but some byte sequences are illegal. Bytes are notUTF-8 text. As n increases, the probability that a string of n randombytes will be utf-8 text approaches 0 faster than interpreting the samebytes as Latin1.


--
Terry Jan Reedy


--
https://mail.python.org/mailman/listinfo/python-list

Re: Cult-like behaviour [was Re: Kindness]

Reply via email to