On Fri, 22 Jun 2018 11:14:59 +0100, Ben Bacarisse wrote: >>> The code page remark is curious. Will some "code pages" have digits >>> that are not ASCII digits? >> >> Good question. I have no idea. > > It's much more of an open question than I thought.
Nah, Python already solves that for you: py> s = "১২৩৪৫.৬৭৮৯০" py> for c in s: ... print(unicodedata.name(c)) ... BENGALI DIGIT ONE BENGALI DIGIT TWO BENGALI DIGIT THREE BENGALI DIGIT FOUR BENGALI DIGIT FIVE FULL STOP BENGALI DIGIT SIX BENGALI DIGIT SEVEN BENGALI DIGIT EIGHT BENGALI DIGIT NINE BENGALI DIGIT ZERO py> float(s) 12345.6789 Further to my earlier post, if you call: for sep in ",u\00B7u\066B": mystring = mystring.replace(sep, '.') before passing it to float, that ought to cover just about anything you will find in real-world data regardless of language. If Ethan finds something that isn't covered by those three cases (comma, middle dot and Arabic decimal separator) he'll likely need to consult an expert on that language. Provided Ethan doesn't have to deal with thousands separators as well. Then it gets complicated. -- Steven D'Aprano "Ever since I learned about confirmation bias, I've been seeing it everywhere." -- Jon Ronson -- https://mail.python.org/mailman/listinfo/python-list