@ALL: Isn't it possible and wise to include an (optional) encoder in pgsql?

we're importing a lot of data from textfiles, which are not utf-8. we always
have to change the encoding in another tool before using COPY.



2011/2/28 Craig Ringer <cr...@postnewspapers.com.au>

> On 27/02/11 20:47, AI Rumman wrote:
> > I am getting error in Postgresql 9.0.1.
> >
> > update import_details_test
> > set data_row = '["4","1 Monor JoÃ\u083ão S. AntÃ\u0083ão
>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Because your email client may have transformed the text encoding, I
> can't make any certain conclusions about what you're actually sending to
> the database, but it's highly likely that you're sending latin-1 encoded
> text to the database while your client_encoding is set to 'utf8'.
>
> The marked text is most likely the problem... but I think there's more
> wrong with it than just being latin-1 encoded. That kind of mangling
> often comes about when utf-8 text has been incorrectly interpreted as
> latin-1 and modified, or when something has incorrectly tried to do
> utf8<->latin-1 conversions more than once. You really need to figure out
> what encoding your input is in, convert it to a known encoding like
> utf-8 *once*, and keep it that way.
>
> If you're using Python, which I suspect you might be, the "".decode()
> function is useful. For example, I can convert a latin-1 encoded byte
> string to a python Unicode string with:
>
>   "somelatin1string".decode("latin-1")
>
> Sometimes you can get away with just "SET client_encoding=latin-1" but
> in this case your string data looks like it's been mangled by more than
> just a single encoding mis-interpretation, so you'll probably just
> silently insert corrupt data by doing that. Don't. Fix your code so it
> knows what the text encoding of the input is.
>
> If you are, in fact, using Python, it's a really good idea to always
> "".decode() all your inputs so your internal processing is done in
> Unicode (UTF-16, in fact). Similarly, Qt programmers should convert
> everything to unicode QString as soon as possible and use that for all
> internal manipulation. It'll save a lot of pain.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Reply via email to