Re: Handle foreign character web input

Richard Damon Sat, 29 Jun 2019 10:29:06 -0700

On 6/29/19 3:19 AM, Thomas Jollans wrote:
> On 28/06/2019 22:25, Tobiah wrote:
>> A guy comes in and enters his last name as RÖnngren.
> With a capital Ö in the middle? That's unusual.
>>
>> So what did the browser really give me; is it encoded
>> in some way, like latin-1?  Does it depend on whether
>> the name was cut and pasted from a Word doc. etc?
>> Should I handle these internally as unicode?  Right
>> now my database tables are latin-1 and things seem
>> to usually work, but not always.
>
>
> If your database is using latin-1, German and French names will work,
> but Croatian and Polish names often won't. Not to mention people using
> other writing systems.
>
> So Günther and François are ok, but Bolesław turns into Boles?aw and
> don't even think about anybody called Владимир or محمد.


I would say that currently, the only real reason to use an encoding
other than Unicode (normally UTF-8) would be historical inertia. Maybe a
field that will only ever have plain ASCII characters could use ASCII
(such a field would never have real natural language words, but only
computer generated codes). All the various 'codepages' were useful in
their day, when machines were less capable, and Unicode hadn't been
invented or wasn't supported well or was too expensive to use.

Now (as I understand it), all Python (3) 'Strings' are internally
Unicode, if you need something with a different encoding it needs to be
in Bytes.

-- 
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Handle foreign character web input

Reply via email to