On May 26, 9:29 am, Pet <petshm...@googlemail.com> wrote: > On May 25, 6:07 pm, Paul Boddie <p...@boddie.org.uk> wrote: > > > > > > > On 25 Mai, 17:39, someone <petshm...@googlemail.com> wrote: > > > > Hi, > > > > reading content of webpage (encoded in utf-8) with urllib2, I can't > > > get parsed data into DB > > > > Exception: > > > > File "/usr/lib/python2.5/site-packages/pyPgSQL/PgSQL.py", line 3111, > > > in execute > > > raise OperationalError, msg > > > libpq.OperationalError: ERROR: invalid UTF-8 byte sequence detected > > > near byte 0xe4 > > > > I've already checked several python unicode tutorials, but I have no > > > idea how to solve my problem. > > > With pyPgSQL, there are a few tricks that you have to take into > > account: > > > 1. With PostgreSQL, it would appear advantageous to create databases > > using the "-E unicode" option. > > Hi, > > DB is in UTF8 > > > > > 2. When connecting, use the client_encoding and unicode_results > > arguments for the connect function call: > > > connection = PgSQL.connect(client_encoding="utf-8", > > unicode_results=1) > > If I do unicode_results=1, then there are exceptions in other places, > e.g. urllib.urlencode(values) > cant encode values > > > > > 3. After connecting, it appears necessary to set the client encoding > > explicitly: > > > connection.cursor().execute("set client_encoding to unicode") > > I've tried this as well, but still have exceptions > > > > > I'd appreciate any suggestions which improve on the above, but what > > this should allow you to do is to present Unicode objects to the > > database and to receive such objects from queries. Whether you can > > relax this and pass UTF-8-encoded strings instead of Unicode objects > > is not something I can guarantee, but it's usually recommended that > > you manipulate Unicode objects in your program where possible, and > > here you should be able to let pyPgSQL deal with the encodings > > preferred by the database. > > Thanks for your suggestions! Sadly, I can't solve my problem... > > Pet > > > > > Paul
After some time, I've tried, to convert result with unicode(result, 'ISO-8859-15') and that was it :) I've thought it was already utf-8, because of charset defining in <meta> of webpage I'm fetching Pet -- http://mail.python.org/mailman/listinfo/python-list