On Wed, 2024-02-07 at 10:53 +0100, Peter Eisentraut wrote: > Various comments are updated to include the term "character class". > I > don't recognize that as an official Unicode term. There are > categories > and properties. Let's check this.
It's based on https://www.unicode.org/reports/tr18/#Compatibility_Properties so I suppose the right name is "properties". > Is it potentially confusing that only some pg_u_prop_* have a posix > variant? Would it be better for a consistent interface to have a > "posix" argument for each and just ignore it if not used? Not sure. I thought about it but didn't see a clear advantage one way or another. > About this initdb --builtin-locale option and analogous options > elsewhere: Maybe we should flip this around and provide a --libc- > locale > option, and have all the other providers just use the --locale > option. > This would be more consistent with the fact that it's libc that is > special in this context. Would --libc-locale affect all the environment variables or just LC_CTYPE/LC_COLLATE? How do we avoid breakage? I like the general direction here but we might need to phase in the option or come up with a new name. Suggestions welcome. > Do we even need the "C" locale? We have established that "C.UTF-8" > is > useful, but if that is easily available, who would need "C"? I don't think we should encourage its use generally but I also don't think it will disappear any time soon. Some people will want it on simplicity grounds. I hope fewer people will use "C" when we have a better builtin option. > Some changes in this patch appear to be just straight renamings, like > in > src/backend/utils/init/postinit.c and > src/bin/pg_upgrade/t/002_pg_upgrade.pl. Maybe those should be put > into > the previous patch instead. > > On the collation naming: My expectation would have been that the > "C.UTF-8" locale would be exposed as the UCS_BASIC collation. I'd like that. We have to sort out a couple things first, though: 1. The SQL spec mentions the capitalization of "ß" as "SS" specifically. Should UCS_BASIC use the unconditional mappings in SpecialCasing.txt? I already have some code to do that (not posted yet). 2. Should UCS_BASIC use the "POSIX" or "Standard" variant of regex properties? (The main difference seems to be whether symbols get treated as punctuation or not.) 3. What do we do about potential breakage for existing users of UCS_BASIC who might be expecting C-like behavior? Regards, Jeff Davis