Akihiro KAYAMA wrote: > Sorry for my terrible English. I am living in Japan, and we have a > large number of characters called Kanji. UTF-16(U+0000...U+10FFFF) is > enough for practical use in this country also, but for academic > purpose, I need a large codespace over 20-bits. I wish I could use > unicode's private space (U+60000000...U+7FFFFFFF) in Python. > > -- kayama
I think the Kanji are part of the Han script as far as Unicode is concerned, you should check it (CJK unified ideograms and CJK unified ideograms extension A), they may not all be there, but the 27502 characters from these two tables should be enough for most uses. Oh, by the way, the Unicode code space only goes up to 10FFFF, while UCS-4's encoding allows code values up to and including 7FFFFFFF the upper Unicode private space is Plane Sixteen (100000–10FFFF), the other private spaces being a part of the Basic Multilingual Plane (U+E000–U+F8FF) and Plane Fifteen (U+F0000–U+FFFFF) and even UTF-32 doesn't go beyond 10FFFF. Since the Dai Kan-Wa jiten "only" lists about 50,000 kanji (even though it probably isn't perfectly complete) it fits with ease in both plane fifteen and sixteen (65535 code points each). -- http://mail.python.org/mailman/listinfo/python-list