On 2017-09-12 16:53:49 +0900, Tatsuo Ishii wrote:
> > read side.  I think that should work because all *server side* encodings
> > store character lengths in the *first* byte of a multibyte character
> 
> What do you mean? I don't see such data in a multibyte string.

Check the information the pg_*_mblen use / how the relevant encodings
work. Will be something like
int
pg_utf_mblen(const unsigned char *s)
{
        int                     len;

        if ((*s & 0x80) == 0)
                len = 1;
        else if ((*s & 0xe0) == 0xc0)
                len = 2;
        else if ((*s & 0xf0) == 0xe0)
                len = 3;
        else if ((*s & 0xf8) == 0xf0)
                len = 4;
#ifdef NOT_USED
        else if ((*s & 0xfc) == 0xf8)
                len = 5;
        else if ((*s & 0xfe) == 0xfc)
                len = 6;
#endif
        else
                len = 1;
        return len;
}

As you can see, only the first character (*s) is accessed to determine
the length/width of the multibyte-character.  That's afaict the case for
all server-side encodings.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to