Re: Unicode grapheme clusters

Greg Stark Thu, 19 Jan 2023 16:38:13 -0800

This is how we've always documented it. Postgres treats code points as
"characters" not graphemes.


You don't need to go to anything as esoteric as emojis to see this either.
Accented characters like é have no canonical forms that are multiple code
points and in some character sets some accented characters can only be
represented that way.

But I don't think there's any reason to consider changing e existing
functions. They have to be consistent with substr and the other string
manipulation functions.

We could add new functions to work with graphemes but it might bring more
pain keeping it up to date....

Re: Unicode grapheme clusters

Reply via email to