On Sun, Sep 17, 2017 at 9:13 AM, Peter Otten <__pete...@web.de> wrote:
> Leam Hall wrote: > > > On 09/17/2017 08:30 AM, Chris Angelico wrote: > >> On Sun, Sep 17, 2017 at 9:38 PM, Leam Hall <leamh...@gmail.com> wrote: > >>> Still trying to keep this Py2 and Py3 compatible. > >>> > >>> The Py2 error is: > >>> UnicodeEncodeError: 'ascii' codec can't encode character > >>> u'\xf6' in position 8: ordinal not in range(128) > >>> > >>> even when the string is manually converted: > >>> name = unicode(self.name) > >>> > >>> Same sort of issue with: > >>> name = self.name.decode('utf-8') > >>> > >>> > >>> Py3 doesn't like either version. > >> > >> You got a Unicode *EN*code error when you tried to *DE* code. That's a > >> quirk of Py2's coercion behaviours, so the error's a bit obscure, but > >> it means that you (most likely) actually have a Unicode string > >> already. Check what type(self.name) is, and see if the problem is > >> actually somewhere else. > >> > >> (It's hard to give more specific advice based on this tiny snippet, > >> sorry.) > >> > >> ChrisA > >> > > > > Chris, thanks! I see what you mean. > > I don't think so. You get a unicode from the database, > > $ python > Python 2.7.6 (default, Oct 26 2016, 20:30:19) > [GCC 4.8.4] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import sqlite3 > >>> db = sqlite3.connect(":memory:") > >>> cs = db.cursor() > >>> cs.execute("select 'foo';").fetchone() > (u'foo',) > >>> > > and when you try to decode it (which is superfluous as you already have > unicode!) Python does what you ask for. But to be able to decode it has to > encode first and by default it uses the ascii codec for that attempt. For > an > all-ascii string > > u"foo".encode("ascii") --> "foo" > > and thus > > u"foo".decode("utf-8) > > implemented as > > u"foo".encode("ascii").decode("utf-8") --> u"foo" > > is basically a noop. However > > u"äöü".encode("ascii") --> raises UnicodeENCODEError > > and thus > > u"äöü".decode("utf-8") > > fails with that. Unfortunately nobody realizes that the encoding failed and > thus will unsuccessfully try and specify other encodings for the decoding > step > > u"äöü".decode("latin1") # also fails > > Solution: if you already have unicode, leave it alone. > Doesn't seem to work. The failing code takes the strings as is from the database. it will occasionally fail when a name comes up that uses a non-ascii character. Lines 44, 60, 66, 67. https://github.com/makhidkarun/py_tools/blob/master/lib/character.py Leam -- https://mail.python.org/mailman/listinfo/python-list