On Thu, Nov 20, 2014, at 20:10, Chris Angelico wrote: > 2) Languages which use a different alphabet (eg Cyrillic - Russian, > Bulgarian). You could possibly cram them into an eight-bit encoding > without tipping ASCII out, but I'm not sure. In Unicode, these > languages are all easily supported by the BMP, as they don't use a > huge number of characters each.
There are numerous eight-bit encodings that support latin and one other alphabet. Remember, ASCII is a seven-bit encoding, and an eight-bit encoding is basically two seven-bit encodings. The most difficult (of those still possible at all) language to encode in eight bits is actually Vietnamese, which uses the Latin alphabet, due to the sheer number of accented letters used. Windows' encoding of it (along with some other lesser used encodings, all for Vietnamese) is the only 8-bit encoding to use combining accents, in a way unfortunately incompatible with unicode normalization if naively translated, whereas VISCII sacrifices a handful of C0 control characters in addition to fully packing the high half with letters. -- Random832 -- https://mail.python.org/mailman/listinfo/python-list