Ben Bacarisse <ben.use...@bsb.me.uk>: > It's 21. The reason being (or at least part of the reason being) that > 21 bits can be UTF-8 encoded in 4 bytes: 11110xxx 10xxxxxx 10xxxxxx > 10xxxxxx (3 + 3*6).
I bet the reason is UTF-16. Microsoft and Sun/Oracle would have insisted on a maximum of 4 bytes per character. UTF-16 can just barely squeeze 21 bits into the scheme and only at the expense of creating an ugly hole inside Unicode. Politics, politics. Marko -- https://mail.python.org/mailman/listinfo/python-list