Re: Order changes in PG16 since ICU introduction

Jeff Davis Fri, 21 Apr 2023 12:25:21 -0700

On Fri, 2023-04-21 at 14:23 -0400, Tom Lane wrote:
> postgres=# CREATE DATABASE test1 TEMPLATE=template0 ENCODING = 'UTF8'
> LOCALE = 'C';


...

>  test1     | postgres | UTF8     | icu             | C          |
> C          | en-US      |           | 
> (4 rows)
> 
> Looks like the "pick en-US even when told not to" problem exists here
> too.

Both provider (ICU) and the icu locale (en-US) are inherited from
template0. The LOCALE parameter to CREATE DATABASE doesn't affect
either of those things, because there's a separate parameter
ICU_LOCALE.

This happens the same way in v15, and although it matches the
documentation technically, it is not a great user experience.

I have a couple ideas:

1. Introduce a "none" provider to separate the concept of C/POSIX
locales from the libc provider. It's not really using a provider
anyway, it's just using memcmp(), and I think it causes confusion to
combine them. Saying "LOCALE_PROVIDER=none" is less error-prone than
"LOCALE_PROVIDER=libc LOCALE='C'".

2. Change the CREATE DATABASE syntax to catch these errors better at
the possible expense of backwards compatibility.

I am also having second thoughts about accepting "C" or "POSIX" as an
ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
terribly useful (why not just use memcmp()?), it's not fast in my
measurements (en-US is faster), so maybe it's better to just throw an
error and tell the user to use C (or provider=none as I suggest
above)? 

Obviously the user could manually type "en-US-u-va-posix" if that's the
locale they want. Throwing an error would be a backwards-compatibility
issue, but in v15 an ICU locale of "C" just gives the root locale
anyway, which is probably not what they want.

Regards,
        Jeff Davis

Re: Order changes in PG16 since ICU introduction

Reply via email to