Tom Lane writes:
> (BTW, I should think that iconv or some related tool would have a
> solution for fixing this miscoding; it's not an uncommon problem.)
I guess recode is handling that.
http://recode.progiciels-bpi.ca/manual/Universal.html#Universal
Regards,
--
dim
--
Sent via pgsql-bugs
Mike Lewis writes:
> I've run into a fair amount of unicode errors when trying to copy in log
> files. Would you recommend using bytea or another data type instead of text
> or varchar... or at least copying to a staging table with bytea's and
> filtering out invalid rows when moving it to the ma
>
>
>
> It is not valid. See http://tools.ietf.org/html/rfc3629 --- a sequence
> beginning with ED must have a second byte in the range 80-9F to be
> legal, and this doesn't. The example you give would decode as U+DF2D,
> ie part of a surrogate pair, which is specifically disallowed in UTF8
> ---
"Michael Lewis" writes:
> I'm using Python to sanitize my logs from invalid UTF8 characters before
> COPYing them into postgres. I came across this one sequence that seems to
> be valid UTF8 (in the extended range I believe).
It is not valid. See http://tools.ietf.org/html/rfc3629 --- a sequenc
The following bug has been logged online:
Bug reference: 5532
Logged by: Michael Lewis
Email address: mikelikes...@gmail.com
PostgreSQL version: 9.0 trunk
Operating system: OS X
Description:Valid UTF8 sequence errors as invalid
Details:
I'm using Python to sanitize