Peter Eisentraut wrote: > Another issue is that we'd need to carefully divide up the role of the > "default" collation and the "default" provider. The default collation > is the collation defined for the database, the default provider means to > use the libc non-locale_t enabled API functions. Right now these are > always the same, but if the database-global locale is ICU, then the > default collation would use the ICU provider.
I think one related issue that the patch works around by using a libc locale as a proxy is knowing what to put into libc's LC_CTYPE and LC_COLLATE. In fact I've been wondering if that's the main reason for the interface implemented by the patch. Otherwise, how should these env variables be initialized for ICU databases? For instance in the existing FTS code, lowerstr_with_len() in tsearch/ts_locale.c calls tolower() or towlower() to fold a string to lower case when normalizing lexemes. This requires LC_CTYPE to be set to something compatible with the database encoding, at the very least. Even if that code looks like it might need to be changed for ICU anyway (or just to be collation-aware according to the TODO marks?), what about comparable calls in extensions? In the case that we don't touch libc's LC_COLLATE/LC_CTYPE in backends, extension code would have them inherited from the postmaster? Does that sound acceptable? If not, maybe ICU databases should have these as settable options, in addition to their ICU locale? Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite