New submission from Vlastimil Brom <vlastimil.b...@gmail.com>: I just noticed an ommision of come character names in unicodedata module. These are some CJK - Ideographs:
龼 (0x9fbc) - 鿋 (0x9fcb) (CJK Unified Ideographs [19968-40959] [0x4e00-0x9fff]) 𪜀 (0x2a700) - 𫜴 (0x2b734) (CJK Unified Ideographs Extension C [173824-177983] [0x2a700-0x2b73f]) 𫝀 (0x2b740) - 𫠝 (0x2b81d) (CJK Unified Ideographs Extension D [177984-178207] [0x2b740-0x2b81f]) The names are probably to be generated - e.g. CJK UNIFIED IDEOGRAPH-2A700 ... etc. (Tested with the recompiled unicodedata - using unicode 6.0; with the py 27 - builtin module (unidata_version: '5.2.0') only the first two ranges are relevant (as CJK Unified Ideographs Extension D is an adition of Unicode 6) (Also there are the unprintable ASCII controls, surrogates and private use areas, where the missing names are probably ok.) I tested with the following rather clumsy code: # # # # # # # # # # # # # # # # wide_unichr = custom unichr emulating unicode ranges beyond FFFF on narrow python build codepoints_missing_char_names = [[-2,-2],] # dummy for i in xrange(0x10FFFF+1): if unicodedata.category(wide_unichr(i))[:1] != 'C' and unicodedata.name(wide_unichr(i), u"??noname??") == u"??noname??": if codepoints_missing_char_names[-1][1] == i-1: codepoints_missing_char_names[-1][1] = i else: codepoints_missing_char_names.append([i, i]) for first, last in codepoints_missing_char_names[1:]: print u"%s (%s) - %s (%s)" % (wide_unichr(first), hex(first), wide_unichr(last), hex(last),) # # # # # # # # # # # # # # # # # # # # # # # # # # Unfortunately, I can't provide a fix, as unicodedata involves C code, where my knowledge is near zero. vbr ---------- messages: 121521 nosy: vbr priority: normal severity: normal status: open title: missing character names in unicodedata (CJK...) _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10459> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com