On Monday 13 April 2009 22:39:58 Andrew Dunstan wrote: > Umm, but isn't that because your encoding is using one code point? > > See the OP's explanation w.r.t. canonical equivalence. > > This isn't about the number of bytes, but about whether or not we should > count characters encoded as two or more combined code points as a single > char or not.
Here is a test case that shows the problem (if your terminal can display combining characters (xterm appears to work)): SELECT U&'\00E9', char_length(U&'\00E9'); ?column? | char_length ----------+------------- é | 1 (1 row) SELECT U&'\0065\0301', char_length(U&'\0065\0301'); ?column? | char_length ----------+------------- é | 2 (1 row) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers