On Sunday, May 29, 2016 at 11:07:51 AM UTC+5:30, Steven D'Aprano wrote: > On Sat, 28 May 2016 02:46 pm, Rustom Mody wrote: > > [...] > > In idealized, simplified models like Turing models where > > 3 is 111 > > 7 is 1111111 > > 100, 8364 etc I wont try to write but you get the idea! > > its quite clear that bigger numbers cost more than smaller ones > > I'm not sure that a tally (base-1, unary) is a good model for memory usage > in any computing system available today. And I thought that the Turing > model was based on binary: the machine could both mark a cell and erase the > mark, which corresponds to a bit.
Well you can take your pick See unary here http://jeapostrophe.github.io/2013-10-29-tmadd-post.html > > > > > With current hardware it would seem to be a flat characteristic for > > everything < 2³² (or even 2⁶⁴) > > > > But thats only an optical illusion because after that the characteristic > > will rise jaggedly, slowly but monotonically, typically log-linearly > > [which AIUI is jmf's principal error] > > Can you be more specific at what you are trying to say? You seem to think > that you're saying something profound here, but I don't know what it is. I think that you seem to think that you know what I seem to think... but I digress. Big numbers are big ie expensive Small numbers are cheap Easy so far?? Then there is technology... making arbitrary decisions eg a word is 32 bits This just muddies the discussion but does not change the speed of light -- aka properties of the universe are invariant in the face of committee decisions -- even international consortiums So it SEEMS (to ppl like jmf) that a million is no more costly than ten However consider an 8 bit machine (eg 8088) the natural size - for fitting 25 is a byte - for 1000 is 2 bytes - for a million is 3 or 4 bytes depending on what we mean by 'natural' In short that a € costs more than a $ is a combination of the factors - a natural cause -- there are a million chars to encode (lets assume that the million of Unicode is somehow God-given AS A SET) - an artificial political one -- out of the million-factorial permutations of that million, the one that the Unicode consortium chose is towards satisfying the equation: Keep ASCII users undisturbed and happy > > > > > Which also means that if the Chinese were to have more say in the design > > of Unicode/ UTF-8 they would likely not waste swathes of prime real-estate > > for almost never used control characters just in the name of ASCII > > compliance > > There is this meme going around that Unicode is a Western imperialistic > conspiracy against Asians. For example, there was a blog post a year or so > ago by somebody bitterly complaining that he could draw a pile of poo in > Unicode but not write his own name, blaming Westerners for this horrible > state of affairs. > > But like most outrage on the Internet, his complaint was nonsense. He *can* > write his name -- he just has to use a combining character to add an > accent(?) to a base character. (Or possibly a better analogy is that of a > ligature.) His complaint came down to the fact that because his name > included a character which was unusual even in his own language (Bengali), > he had to use two Unicode code points rather than one to represent it. This > is, of course, the second worst[1] kind of discrimination. > > https://news.ycombinator.com/item?id=9219162 > > Likewise the hoo-har over CJK unification. Some people believe that this is > the evil Western imperialists forcing their ignorant views on the Chinese, > Japanese and Koreans, but the reality is that the Unicode Consortium merely > follows the decisions made by the Ideographic Rapporteur Group (IRG), > originally the CJK-JRG group. That is a multinational group set up by the > Chinese and Japanese, now including other East Asians (both Koreas, > Singapore, Vietnam) to decide on a common set of Han characters. > > Anyway, I digress. > > Given that there are tens of thousands of Han characters (with unification), > more than will fit in 16 bits, the 64 control characters in Unicode is not > going to make any practical difference. In some hypothetical world where > Han speakers got to claim code points U+0000-001F and U+0080-009F for > ideographs, pushing the control characters out into the astral planes, all > they would gain is *sixty four* code points. They would still need multiple > thousands of astral characters. > > Besides, some level of ASCII compatibility is useful even for Han speakers. > Their own native-designed standard encodings like Big5 and Shift-JIS (which > predate Unicode) keep byte-compatibility with the 32 ASCII control > characters. (I'm not sure about the 32 "C1" control characters.) Since the > Chinese and Japanese national standards pre-dating Unicode choose to keep > compatibility with the ASCII control characters, I don't think that there > is any good reason to think they would have made a different decision when > it came to Unicode had they had more of a say than they already did. > > Which was, and still is, considerable. Both China and Japan are very > influential in the Unicode Consortium, driving the addition of many new Han > characters and emoji. The idea that a bunch of Western corporations and > academics are pushing them around is laughable. > > > > > [1] The worst being that my US English keyboard doesn't have a proper curly > apostrophe, forcing me to use a straight ' mark in my name like some sort > of animal. > > -- > Steven -- https://mail.python.org/mailman/listinfo/python-list