On Sun, Mar 8, 2020 at 2:19 PM Tom Lane <t...@sss.pgh.pa.us> wrote:

> I wrote:
> > James Coleman <jtc...@gmail.com> writes:
> >> I'm still interested in understanding why we're using the ISO locale
> >> instead of the utf8 one in a utf8-labeled test though.
>
> > We are not.  My understanding of the rules about this is that the
> > active LC_CTYPE setting determines the encoding that libc uses,
> > period.  The encoding suffix on the locale name only makes a
> > difference when LC_CTYPE is being specified (or derived from LANG or
> > LC_ALL), not any other LC_XXX setting --- although for consistency
> > they'll let you include it in any LC_XXX value.
>
> Oh wait --- I'm wrong about that.  Looking at the code in pg_locale.c,
> what actually happens is that we get data in the codeset implied by
> the LC_TIME setting and then translate it to the database encoding
> (cf commit 7ad1cd31b).  So if bare "tr_TR" is taken as implying
> iso-8859-9, which seems likely (it appears to work that way here,
> anyway) then this test is exercising the codeset translation path.
> We could change the test to say 'tr_TR.utf8' but that would give us
> less test coverage.
>
So just to confirm I understand, that implies that the issue is solely that
only the utf8 tr_TR set is installed by default on this machine, and the
iso-8859-9 set is a hard requirement (that is, the test is explicitly
testing a codepath that generates utf8 results from a non-utf8 source)?

If so, I'm going to try a bare Ubuntu install on a VM and see what locales
are installed by default for Turkish.

If in fact Ubuntu doesn't install this locale by default, then is this a
caveat we should add to developer docs somewhere? It seems odd to me that
I'd be the only one encountering it, but OTOH I would have thought this a
fairly vanilla install too...

James

Reply via email to