Bugs item #1450212, was opened at 2006-03-15 09:05 Message generated for change (Settings changed) made by peufeu You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1450212&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Pierre-Frédéric Caillaud (peufeu) Assigned to: Nobody/Anonymous (nobody) >Summary: int() and isdigit() accept non-digit unicode numbers Initial Comment: I had a very surprising bug this morning, in a python script which extract numeric information from human entered text. The problem is the following : many UNICODE characters, in UNICODE strings, are considered to be digits. For instance, the character "²" (does it appear on your screen ? it's u'\xb2'). The output of the following command is pretty interesting : print ''.join([x for x in map( unichr, xrange( 65536 )) if x.isdigit()]) Then, int() will happily parse the string : int( u"٥٦٧٨٩۰۱۲" ) 56789012 (I really hope this bug system supports unicode). However, I can't do a=٥٦٧٨٩۰۱۲ for instance. Philosophically, Python is right, these characters are probably all digits, and it's pretty cool to be able to parse numbers written in ARABIC-INDIC DIGITs or something, as unicodedata.name says). However, from a practical point of view, I guess most parsing done with python isn't on OCR'd cuneiform stone tablets, but rather modern computer documents... Whenever a surface (in m²) was near a phone number in my human entered text, the "²" would be absorbed as a part of the phone number, because u"²".isdigit() is True. Then bullshit phone numbers would appear on the website. Any number followed by a little footnote number will get the footnote number embedded... I had to replace all the .isdigit() with a re.compile( ur"^\d+$" ). match(). Interestingly, for re, even in unicode, \d is 0-9 and nothing else. At least, it would be normal for int() to raise an exception when fed this type of data. Please. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1450212&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com