Jeff Davis wrote:

> >   For libc: this change may affect any user who happened to have
> > LANG=C.UTF-8 in their environment at initdb time, which is probably a
> > lot of users, and some buildfarm members. However, the average risk
> > seems to be much lower, because we've gone a long time with the
> > assumption that C.UTF-8 has the same behavior as C, and this only
> > recently came up.

Currently, neither lc_collate_is_c() nor lookup_collation_cache()
think that C.UTF-8 is a C collation, since they do that kind of test:

                if (strcmp(localeptr, "C") == 0)
                        result = true;
                else if (strcmp(localeptr, "POSIX") == 0)
                        result = true;
                else
                        result = false;

What is relatively new (v15) is that we compute a version for libc
collations in get_collation_actual_version(), with code that assumes
that C.* does not need a version, implying that it's immune to
Unicode changes. What came up in this thread is that this assumption
is not true for at least one major platform: Debian/Ubuntu for
releases occurring before 2022 (glibc < 2.35).


> We can avoid this risk by converting C.anything or POSIX.anything to
> plain "C" or "POSIX", respectively, for new collations before storing
> the string in the catalog. For upgraded collations, we can preserve the
> existing locale name. When opening the locale, we would still only
> recognize plain "C" and "POSIX" as the C locale.


Then Postgres would not sort the same as the operating system with the
same locale, at least on some OS. Concerning glibc, after waiting a
few years, glibc<2.35 will be obsolete, and C.UTF-8 sorting like C
will happen by itself.
But in the meantime, personally I don't quite see why Postgres should
start forcing C.UTF-8 to sort differently in the database than in the
OS.


Best regards,
-- 
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite


Reply via email to