Jeff Davis wrote: > > For libc: this change may affect any user who happened to have > > LANG=C.UTF-8 in their environment at initdb time, which is probably a > > lot of users, and some buildfarm members. However, the average risk > > seems to be much lower, because we've gone a long time with the > > assumption that C.UTF-8 has the same behavior as C, and this only > > recently came up.
Currently, neither lc_collate_is_c() nor lookup_collation_cache() think that C.UTF-8 is a C collation, since they do that kind of test: if (strcmp(localeptr, "C") == 0) result = true; else if (strcmp(localeptr, "POSIX") == 0) result = true; else result = false; What is relatively new (v15) is that we compute a version for libc collations in get_collation_actual_version(), with code that assumes that C.* does not need a version, implying that it's immune to Unicode changes. What came up in this thread is that this assumption is not true for at least one major platform: Debian/Ubuntu for releases occurring before 2022 (glibc < 2.35). > We can avoid this risk by converting C.anything or POSIX.anything to > plain "C" or "POSIX", respectively, for new collations before storing > the string in the catalog. For upgraded collations, we can preserve the > existing locale name. When opening the locale, we would still only > recognize plain "C" and "POSIX" as the C locale. Then Postgres would not sort the same as the operating system with the same locale, at least on some OS. Concerning glibc, after waiting a few years, glibc<2.35 will be obsolete, and C.UTF-8 sorting like C will happen by itself. But in the meantime, personally I don't quite see why Postgres should start forcing C.UTF-8 to sort differently in the database than in the OS. Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite