both lc_type and lc_collate are en_US.UTF-8. Sorry for missing them in the
original post.

I understand that collate has impact on sorting order, but the fact that
char_length() is not returning the correct length in char for certain
characters (non-BMP) is an indication that unicode is not fully supported.
If char_length() is not working properly, I'd expect that substring() won't
work either.

The PostgreSQL I am using is an AWS PostgreSQL RDS. I can check with AWS,
but presumably that they are running PostgreSQL RDS on some flavor of lunix.

My client is PgAdmin 4 running on a Windows 7 machine. I understand that
some client tools may not be able to display all unicode chars, but I do
expect that the function ascii() return correct values  of the stored
chars.

For me the primary requirement is storing and retrieving all unicode
characters as they are, and char_length() returns the correct values for
all supported unicode chars. Correct sorting is nice-to-have.

Any help to get unicode chars, particularly the mojos (0x1F478, 0x1F479), in
and out of pg correctly is much appreciated. Thank you!

James


On Tue, Dec 20, 2016 at 9:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> James Zhou <ja...@360data.ca> writes:
> >       - *But their sorting order seems to be undefined. Can anyone
> comment
> >       the sorting rules?*
>
> Well, it would depend on lc_collate, which you have not told us, and
> it would also depend on how well your platform's strcoll() function
> implements that collation; but you have not told us what platform this
> is running on.
>
> Most of the other behaviors you mention are also partly or wholly
> dependent on which software you use with Postgres and whether you've
> correctly configured that software.  So it's pretty hard to answer
> this usefully with only this much info.
>
>                         regards, tom lane
>

Reply via email to