On 2017-09-12 16:53:49 +0900, Tatsuo Ishii wrote: > > read side. I think that should work because all *server side* encodings > > store character lengths in the *first* byte of a multibyte character > > What do you mean? I don't see such data in a multibyte string.
Check the information the pg_*_mblen use / how the relevant encodings work. Will be something like int pg_utf_mblen(const unsigned char *s) { int len; if ((*s & 0x80) == 0) len = 1; else if ((*s & 0xe0) == 0xc0) len = 2; else if ((*s & 0xf0) == 0xe0) len = 3; else if ((*s & 0xf8) == 0xf0) len = 4; #ifdef NOT_USED else if ((*s & 0xfc) == 0xf8) len = 5; else if ((*s & 0xfe) == 0xfc) len = 6; #endif else len = 1; return len; } As you can see, only the first character (*s) is accessed to determine the length/width of the multibyte-character. That's afaict the case for all server-side encodings. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers