On Sun, Aug 11, 2013 at 12:14 PM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > Consider a single character. It can have 0 to 5 accents, in any > combination. Order doesn't matter, and there are no duplicates, so there > are: > > 0 accent: take 0 from 5 = 1 combination; > 1 accent: take 1 from 5 = 5 combinations; > 2 accents: take 2 from 5 = 5!/(2!*3!) = 10 combinations; > 3 accents: take 3 from 5 = 5!/(3!*2!) = 10 combinations; > 4 accents: take 4 from 5 = 5 combinations; > 5 accents: take 5 from 5 = 1 combination > > giving a total of 32 combinations for a single character. Since there are > four characters in this hypothetical language that take accents, that > gives a total of 4*32 = 128 distinct code points needed.
There's an easy way to calculate it. Instead of the "take N from 5" notation, simply look at it as a set of independent bits - each of your accents may be either present or absent. So it's 1<<5 combinations for a single character, which is the same 32 figure you came up with, but easier to work with in the ridiculous case. > In reality, Unicode has currently code points U+0300 to U+036F (112 code > points) to combining characters. It's not really meaningful to combine > all 112 of them, or even most of 112 of them... If you *were* to use literally ANY combination, that would be 1<<112 which is... uhh... five billion yottacombinations. Don't bother working that one out by the "take N" method, it'll take you too long :) Oh, and that's 1<<112 possible combining character combinations, so you then need to multiply that by the number of base characters you could use.... ChrisA -- http://mail.python.org/mailman/listinfo/python-list