Rustom Mody <rustompm...@gmail.com>: > On Wednesday, May 25, 2016 at 4:18:02 PM UTC+5:30, Marko Rauhamaa wrote: >> Christopher Reimer: >> >> > Back in the early 1980's, I grew up on 8-bit processors and latin-1 was >> > all we had for ASCII. >> >> You really were very advanced. According to <URL: >> https://en.wikipedia.org/wiki/ISO/IEC_8859-1#History>, ISO 8859-1 was >> standardized in 1985. "Eight-bit-cleanness" became a thing in the early >> 1990's. > > [...] > > Thanks to this (sub)thread Ive added a new section: "Lemma: 7=8" > here http://blog.languager.org/2014/04/unicode-and-unix-assumption.html
A related anecdote from maybe 1990: I worked in a project team. We had designed a data encoding format that made use of 8-bit character strings (SunOS 4, Sparc, C). One morning a coworker stated that the standard library's strcmp() seems to be buggy. He quickly solved the problem by writing his own strcmp(). I found it surprising that a function so elementary as strcmp() could go wrong so I took a look at its disassembly. It turns out Sun engineers had heavily optimized the function. In particular, if both strings were 32-bit-aligned, the loop was carried out using clever 32-bit integer operations. Only they had made a mistake. Their algorithm checked these bits of an integer result: 31 24 16 8 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ^ ^ ^ ^ While they *should* have checked these positions: 31 24 16 8 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ^ ^ ^ ^ As a result of their bug, every fourth position of the string had its high-order bit ignored for strcmp. In particular, '\200' was treated as an end-of-string marker. The fix was obvious: check bit 32. However, 32-bit integers don't have a bit 32, which explains the oversight. Luckily, the 33th bit was readily available in the CPU's carry flag so the optimization could be salvaged easily. I sent a complimentary report to Sun Microsystems' customer service. I got an email back stating we were out of support and they wouldn't be talking to us. I thought, ok, their loss, and we went happily forward with our naïve, two-line strcmp() replacement. Some three months later, the same customer service rep sent another email confirming the finding and thanking us for reporting it. Marko -- https://mail.python.org/mailman/listinfo/python-list