Re: translating foreign data

Steven D'Aprano Fri, 22 Jun 2018 03:51:10 -0700

On Fri, 22 Jun 2018 11:14:59 +0100, Ben Bacarisse wrote:

>>> The code page remark is curious.  Will some "code pages" have digits
>>> that are not ASCII digits?
>>
>> Good question.  I have no idea.
> 
> It's much more of an open question than I thought.


Nah, Python already solves that for you:


py> s = "১২৩৪৫.৬৭৮৯০"
py> for c in s:
...     print(unicodedata.name(c))
...
BENGALI DIGIT ONE
BENGALI DIGIT TWO
BENGALI DIGIT THREE
BENGALI DIGIT FOUR
BENGALI DIGIT FIVE
FULL STOP
BENGALI DIGIT SIX
BENGALI DIGIT SEVEN
BENGALI DIGIT EIGHT
BENGALI DIGIT NINE
BENGALI DIGIT ZERO
py> float(s)
12345.6789



Further to my earlier post, if you call:

for sep in ",u\00B7u\066B":
    mystring = mystring.replace(sep, '.')

before passing it to float, that ought to cover just about anything you 
will find in real-world data regardless of language. If Ethan finds 
something that isn't covered by those three cases (comma, middle dot and 
Arabic decimal separator) he'll likely need to consult an expert on that 
language.

Provided Ethan doesn't have to deal with thousands separators as well. 
Then it gets complicated.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: translating foreign data

Reply via email to