On 6/29/19 3:19 AM, Thomas Jollans wrote: > On 28/06/2019 22:25, Tobiah wrote: >> A guy comes in and enters his last name as RÖnngren. > With a capital Ö in the middle? That's unusual. >> >> So what did the browser really give me; is it encoded >> in some way, like latin-1? Does it depend on whether >> the name was cut and pasted from a Word doc. etc? >> Should I handle these internally as unicode? Right >> now my database tables are latin-1 and things seem >> to usually work, but not always. > > > If your database is using latin-1, German and French names will work, > but Croatian and Polish names often won't. Not to mention people using > other writing systems. > > So Günther and François are ok, but Bolesław turns into Boles?aw and > don't even think about anybody called Владимир or محمد.
I would say that currently, the only real reason to use an encoding other than Unicode (normally UTF-8) would be historical inertia. Maybe a field that will only ever have plain ASCII characters could use ASCII (such a field would never have real natural language words, but only computer generated codes). All the various 'codepages' were useful in their day, when machines were less capable, and Unicode hadn't been invented or wasn't supported well or was too expensive to use. Now (as I understand it), all Python (3) 'Strings' are internally Unicode, if you need something with a different encoding it needs to be in Bytes. -- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list