Re: [GENERAL] another seemingly simple encoding question

John D. Burger Fri, 24 Mar 2006 06:47:58 -0800

This doesn't sound like your problem, but I'll explain thenormalization issue using Korean as an example, since that seems to beyour data: There are codepoints in Unicode both for Hangul and Jamo,so a Hangul glyph can be represented either with the singlecorresponding codepoint, or as two or three Jamo codepoints. A Unicodefont would display these two alternatives identically. In any Unicodeencoding, including UTF8, these two strings would not be byte-for-byteidentical. The Unicode normalization forms are four algorithms fornormalizing the strings in such a way that they do compare identically.

Anyway, it sounds like you have the opposite problem, two strings thatare comparing equal when you think they shouldn't. I don't know thatanyone can help you unless you post an actual example of two suchstrings.


- John D. Burger
  MITRE


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

              http://archives.postgresql.org

Re: [GENERAL] another seemingly simple encoding question

Reply via email to