On Sat, 28 May 2016 02:46 pm, Rustom Mody wrote: [...] > In idealized, simplified models like Turing models where > 3 is 111 > 7 is 1111111 > 100, 8364 etc I wont try to write but you get the idea! > its quite clear that bigger numbers cost more than smaller ones
I'm not sure that a tally (base-1, unary) is a good model for memory usage in any computing system available today. And I thought that the Turing model was based on binary: the machine could both mark a cell and erase the mark, which corresponds to a bit. > With current hardware it would seem to be a flat characteristic for > everything < 2³² (or even 2⁶⁴) > > But thats only an optical illusion because after that the characteristic > will rise jaggedly, slowly but monotonically, typically log-linearly > [which AIUI is jmf's principal error] Can you be more specific at what you are trying to say? You seem to think that you're saying something profound here, but I don't know what it is. > Which also means that if the Chinese were to have more say in the design > of Unicode/ UTF-8 they would likely not waste swathes of prime real-estate > for almost never used control characters just in the name of ASCII > compliance There is this meme going around that Unicode is a Western imperialistic conspiracy against Asians. For example, there was a blog post a year or so ago by somebody bitterly complaining that he could draw a pile of poo in Unicode but not write his own name, blaming Westerners for this horrible state of affairs. But like most outrage on the Internet, his complaint was nonsense. He *can* write his name -- he just has to use a combining character to add an accent(?) to a base character. (Or possibly a better analogy is that of a ligature.) His complaint came down to the fact that because his name included a character which was unusual even in his own language (Bengali), he had to use two Unicode code points rather than one to represent it. This is, of course, the second worst[1] kind of discrimination. https://news.ycombinator.com/item?id=9219162 Likewise the hoo-har over CJK unification. Some people believe that this is the evil Western imperialists forcing their ignorant views on the Chinese, Japanese and Koreans, but the reality is that the Unicode Consortium merely follows the decisions made by the Ideographic Rapporteur Group (IRG), originally the CJK-JRG group. That is a multinational group set up by the Chinese and Japanese, now including other East Asians (both Koreas, Singapore, Vietnam) to decide on a common set of Han characters. Anyway, I digress. Given that there are tens of thousands of Han characters (with unification), more than will fit in 16 bits, the 64 control characters in Unicode is not going to make any practical difference. In some hypothetical world where Han speakers got to claim code points U+0000-001F and U+0080-009F for ideographs, pushing the control characters out into the astral planes, all they would gain is *sixty four* code points. They would still need multiple thousands of astral characters. Besides, some level of ASCII compatibility is useful even for Han speakers. Their own native-designed standard encodings like Big5 and Shift-JIS (which predate Unicode) keep byte-compatibility with the 32 ASCII control characters. (I'm not sure about the 32 "C1" control characters.) Since the Chinese and Japanese national standards pre-dating Unicode choose to keep compatibility with the ASCII control characters, I don't think that there is any good reason to think they would have made a different decision when it came to Unicode had they had more of a say than they already did. Which was, and still is, considerable. Both China and Japan are very influential in the Unicode Consortium, driving the addition of many new Han characters and emoji. The idea that a bunch of Western corporations and academics are pushing them around is laughable. [1] The worst being that my US English keyboard doesn't have a proper curly apostrophe, forcing me to use a straight ' mark in my name like some sort of animal. -- Steven -- https://mail.python.org/mailman/listinfo/python-list