Hi, [Originally posted this to the dev list, but the moderator advised posting here first]
I'm looking into implementing this module for Jython, and I'm trying to understand the contracts promised by the various methods. Please bear in mind that means I'm probably targeting the CPython implementation as of 2.3, although I would obviously be quite happy if my implementation doesn't need too much extra to fit the 2.5 functionality! As someone has previously posted [1], the documentation is a little thin and they were pointed at the Unicode specification [2]. I've done a little reading there, and have a little knowledge now, which is always dangerous. There are still gaps, and I was hoping someone here might be able to point out what I'm missing. My problem, described here [3], but I'll summarise and add a little to it. 2468;CIRCLED DIGIT NINE;No;0;EN; 0039;;9;9;N;;;;; (UnicodeData.txt [4] for Unicode 3.2.0 [5] entry for code-point 0x2468) verify(unicodedata.decimal(u'\u2468',None) is None) verify(unicodedata.digit(u'\u2468') == 9) verify(unicodedata.numeric(u'\u2468') == 9.0) That works fine, and I can see in the UnicodeData.txt file (the mirrored property N towards the end is a fine marker; go back three fields and then start working forward from there) that the decimal property isn't defined, the digit property is 9 and the numeric property is also 9. However, this next bit is what confuses me: 325F;CIRCLED NUMBER THIRTY FIVE;No;0;ON; 0033 0035;;;35;N;;;;; (UnicodeData.txt for Unicode 3.2.0 entry for code-point 0x325F) verify(unicodedata.decimal(u'\u325F',None) is None) verify(unicodedata.digit(u'\u325F', None) is None) verify(unicodedata.numeric(u'\u325F') == 35.0) The last one fails - ValueError: not a numeric character. Now, again looking at the UnicodeData.txt entry and the mirrored N property, working back three fields and going forward from there shows that the decimal property isn't set, the digit property isn't set and the numeric property appears to be 35. So from my understanding of the Unicode (3.2.0) spec, the code point 0x325F has a numeric property with a value of 35, but the python (2.3 and 2.4 - I haven't put 2.5 onto my box yet) implementation of unicodedata disagrees, presumably for good reason. I can't see where I'm going wrong. Cheers, James [1] http://groups.google.com/group/comp.lang.python/browse_frm/thread/39a894325686f329/7dbdda27be118836?lnk=st&q=unicodedata&rnum=10#7dbdda27be118836 [2] http://www.unicode.org/ [3] http://eternusuk.blogspot.com/2007/02/jython-unicodedata-initial-overview.html [4] http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.txt [5] http://www.unicode.org/Public/3.2-Update/UnicodeData-3.2.0.html -- http://mail.python.org/mailman/listinfo/python-list