On Wed, Mar 20, 2024 at 05:13:26PM -0700, Jeff Davis wrote: > On Tue, 2024-03-19 at 13:41 +0100, Peter Eisentraut wrote: > > * v25-0002-Support-C.UTF-8-locale-in-the-new-builtin-collat.patch > > > > Looks ok. > > Committed.
> <varlistentry> > + <term><literal>pg_c_utf8</literal></term> > + <listitem> > + <para> > + This collation sorts by Unicode code point values rather than natural > + language order. For the functions <function>lower</function>, > + <function>initcap</function>, and <function>upper</function>, it uses > + Unicode simple case mapping. For pattern matching (including regular > + expressions), it uses the POSIX Compatible variant of Unicode <ulink > + > url="https://www.unicode.org/reports/tr18/#Compatibility_Properties">Compatibility > + Properties</ulink>. Behavior is efficient and stable within a > + <productname>Postgres</productname> major version. This collation is > + only available for encoding <literal>UTF8</literal>. > + </para> > + </listitem> > + </varlistentry> lower(), initcap(), upper(), and regexp_matches() are PROVOLATILE_IMMUTABLE. Until now, we've delegated that responsibility to the user. The user is supposed to somehow never update libc or ICU in a way that changes outcomes from these functions. Now that postgresql.org is taking that responsibility for builtin C.UTF-8, how should we govern it? I think the above text and [1] convey that we'll update the Unicode data between major versions, making functions like lower() effectively STABLE. Is that right? (This thread had some discussion[2] that datcollversion/collversion won't necessarily change when a major versions changes lower() behavior.) [1] https://postgr.es/m/7089acb3ebac0c1682a79c8bc16803cf06896fb9.ca...@j-davis.com [2] https://postgr.es/m/5a1ecc40539f36cac5b27a62739a45a49785ca54.ca...@j-davis.com