At Thu, 15 Sep 2022 18:41:31 +0300, Marina Polyakova <m.polyak...@postgrespro.ru> wrote in > P.S. While working on the patch, I discovered that UTF8 encoding is > always used for the ICU provider in initdb unless it is explicitly > specified by the user: > > if (!encoding && locale_provider == COLLPROVIDER_ICU) > encodingid = PG_UTF8; > > IMO this creates additional errors for locales with other encodings: > > $ initdb --locale de_DE.iso885915@euro --locale-provider icu > --icu-locale de-DE > ... > initdb: error: encoding mismatch > initdb: detail: The encoding you selected (UTF8) and the encoding that > the selected locale uses (LATIN9) do not match. This would lead to > misbehavior in various character string processing functions. > initdb: hint: Rerun initdb and either do not specify an encoding > explicitly, or choose a matching combination. > > And ICU supports many encodings, see the contents of pg_enc2icu_tbl in > encnames.c...
It seems to me the best default that fits almost all cases using icu locales. So, we need to specify encoding explicitly in that case. $ initdb --encoding iso-8859-15 --locale de_DE.iso885915@euro --locale-provider icu --icu-locale de-DE However, I think it is hardly understantable from the documentation. (I checked this using euc-jp [1] so it might be wrong..) [1] initdb --encoding euc-jp --locale ja_JP.eucjp --locale-provider icu --icu-locale ja-x-icu regards. -- Kyotaro Horiguchi NTT Open Source Software Center