On Sat, Mar 24, 2018 at 11:11 AM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > On Fri, 23 Mar 2018 07:46:16 -0700, Tobiah wrote: > >> If I changed my database tables to all be UTF-8 would this work cleanly >> without any decoding? > > Not reliably or safely. It will appear to work so long as you have only > pure ASCII strings from the database, and then crash when you don't: > > py> text_from_database = u"hello wörld".encode('latin1') > py> print text_from_database > hello w�rld > py> json.dumps(text_from_database) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/usr/local/lib/python2.7/json/__init__.py", line 231, in dumps > return _default_encoder.encode(obj) > File "/usr/local/lib/python2.7/json/encoder.py", line 195, in encode > return encode_basestring_ascii(o) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xf6 in position 7: > invalid start byte >
If the database has been configured to use UTF-8 (as mentioned, that's "utf8mb4" in MySQL), you won't get that byte sequence back. You'll get back valid UTF-8. At least, if ever you don't, that's a MySQL bug, and not your fault. So yes, it WILL work cleanly. Reliably and safely. ChrisA -- https://mail.python.org/mailman/listinfo/python-list