Re: [GENERAL] Trouble with UTF-8 data

Janine Sisk Fri, 18 Jan 2008 10:11:36 -0800

On Jan 18, 2008, at 12:00 AM, Albe Laurenz wrote:

0xEDA7A1 (UTF-8) corresponds to UNICODE code point 0xD9E1, which,
when interpreted as a high surrogare and followed by a low surrogate,
would correspond to the UTF-16 encoding of a code point

between 0x88400 and 0x887FF (depending on the value of the lowsurrogate).


These code points do not correspond to any valid character.
So - unless there is a flaw in my reasoning - there's something
fishy with these data anyway.

Janine, could you give us a hex dump of that line from the copystatement?

Certainly. Do you want to see it as it came from the old database,or after I ran it through iconv? Although iconv wasn't able to solvethis problem it did fix others in other tables; unfortunately I haveno way of knowing if it also mangled some data at the same time.

The version of iconv I have does know about UTF16 so I tried usingthat as the "from" encoding instead of UTF8, but the result had newerrors in places where the original data was good, so that wasobviously a step backwards.

BTW, in case it matters I found out I misidentified the version of PGthis data came from - it's actually 7.3.6.


thanks,

janine


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Trouble with UTF-8 data

Reply via email to