On Sun, Mar 20, 2016, at 10:55, Ben Bacarisse wrote: > It's 21. The reason being (or at least part of the reason being) that > 21 bits can be UTF-8 encoded in 4 bytes: 11110xxx 10xxxxxx 10xxxxxx > 10xxxxxx (3 + 3*6).
The reason is the UTF-16 limit. Prior to that, UTF-8 had no such limit (it could encode up to 31 bits, as six bytes), and it doesn't account for the fact that four bytes can encode up to U+1FFFFF rather than U+10FFFF. -- https://mail.python.org/mailman/listinfo/python-list