both lc_type and lc_collate are en_US.UTF-8. Sorry for missing them in the original post.
I understand that collate has impact on sorting order, but the fact that char_length() is not returning the correct length in char for certain characters (non-BMP) is an indication that unicode is not fully supported. If char_length() is not working properly, I'd expect that substring() won't work either. The PostgreSQL I am using is an AWS PostgreSQL RDS. I can check with AWS, but presumably that they are running PostgreSQL RDS on some flavor of lunix. My client is PgAdmin 4 running on a Windows 7 machine. I understand that some client tools may not be able to display all unicode chars, but I do expect that the function ascii() return correct values of the stored chars. For me the primary requirement is storing and retrieving all unicode characters as they are, and char_length() returns the correct values for all supported unicode chars. Correct sorting is nice-to-have. Any help to get unicode chars, particularly the mojos (0x1F478, 0x1F479), in and out of pg correctly is much appreciated. Thank you! James On Tue, Dec 20, 2016 at 9:24 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > James Zhou <ja...@360data.ca> writes: > > - *But their sorting order seems to be undefined. Can anyone > comment > > the sorting rules?* > > Well, it would depend on lc_collate, which you have not told us, and > it would also depend on how well your platform's strcoll() function > implements that collation; but you have not told us what platform this > is running on. > > Most of the other behaviors you mention are also partly or wholly > dependent on which software you use with Postgres and whether you've > correctly configured that software. So it's pretty hard to answer > this usefully with only this much info. > > regards, tom lane >