On Wed, 8 Apr 2015 11:49 am, Chris Angelico wrote: > You could use base 1,114,112 fairly readily in any decent modern > programming language. That'll happily represent base one-million.
Well, not really... Here is the breakdown of Unicode code points by category, as of Python 3.3: # Other Cc: 65 (control characters) Cf: 139 (format characters) Cn: 864415 (unassigned) Co: 137468 (private use) Cs: 2048 (surrogates) # Letters Ll: 1751 (lowercase) Lm: 237 (modifier) Lo: 97553 (other) Lt: 31 (titlecase) Lu: 1441 (uppercase) # Marks Mc: 353 (spacing combining) Me: 12 (enclosing) Mn: 1280 (nonspacing) # Numbers Nd: 460 (decimal digit) Nl: 224 (letter) No: 464 (other) # Punctuation Pc: 10 (connector) Pd: 23 (dash) Pe: 71 (close) Pf: 10 (final quote) Pi: 12 (initial quote) Po: 434 (other) Ps: 72 (open) # Symbols Sc: 48 (currency) Sk: 115 (modifier) Sm: 952 (math) So: 4404 (other) # Separator Zl: 1 (line) Zs: 18 (paragraph) Zp: 1 (space) Clearly we shouldn't use control or format characters, surrogates, separators, marks, etc. (At least, I hope it is clear why you don't want, say, newlines, to be used as digits.) Punctuation is borderline, as are symbols, since that won't interoperate well with anything else. How can you parse number+number if the numbers themselves might contain + signs? I wouldn't use unassigned code points, as that is all but guaranteed to lead to future problems, but I might reluctantly allow private use. That leaves us the following which *may* be suitable: Co: 137468 (private use) Ll: 1751 (lowercase) Lo: 97553 (other) Lt: 31 (titlecase) Lu: 1441 (uppercase) Nd: 460 (decimal digit) Nl: 224 (letter) No: 464 (other) Sc: 48 (currency) Sm: 952 (math) So: 4404 (other) which comes to a total of 244796, far short of a million. Add in the 632 punctuation marks if you like, and we're short. There are other problems too: - Confusables. Can you tell the difference between AΑА versus АAΑ, or ВΒB versus BΒВ? Or even O versus 0? - Lack of glyphs for the majority of those code points in most fonts. Most numbers will look like a sequence of boxes. - Difficulty of data entry. - Some people's digits will not have the value that they expect, e.g. digit '1' might not have the numeric value 1, for at least all-but-ten of the 460 different decimal digits in use. - Realistically, who is going to use this? Even as an intellectual exercise, using huge bases for human input and output isn't very useful. The idea of using massive implicit bases for the internal implementation of BigNums is quite reasonable, but for human input and output, it doesn't fly. -- Steven -- https://mail.python.org/mailman/listinfo/python-list