On Sun, Mar 8, 2020 at 2:19 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > I wrote: > > James Coleman <jtc...@gmail.com> writes: > >> I'm still interested in understanding why we're using the ISO locale > >> instead of the utf8 one in a utf8-labeled test though. > > > We are not. My understanding of the rules about this is that the > > active LC_CTYPE setting determines the encoding that libc uses, > > period. The encoding suffix on the locale name only makes a > > difference when LC_CTYPE is being specified (or derived from LANG or > > LC_ALL), not any other LC_XXX setting --- although for consistency > > they'll let you include it in any LC_XXX value. > > Oh wait --- I'm wrong about that. Looking at the code in pg_locale.c, > what actually happens is that we get data in the codeset implied by > the LC_TIME setting and then translate it to the database encoding > (cf commit 7ad1cd31b). So if bare "tr_TR" is taken as implying > iso-8859-9, which seems likely (it appears to work that way here, > anyway) then this test is exercising the codeset translation path. > We could change the test to say 'tr_TR.utf8' but that would give us > less test coverage. >
So just to confirm I understand, that implies that the issue is solely that only the utf8 tr_TR set is installed by default on this machine, and the iso-8859-9 set is a hard requirement (that is, the test is explicitly testing a codepath that generates utf8 results from a non-utf8 source)? If so, I'm going to try a bare Ubuntu install on a VM and see what locales are installed by default for Turkish. If in fact Ubuntu doesn't install this locale by default, then is this a caveat we should add to developer docs somewhere? It seems odd to me that I'd be the only one encountering it, but OTOH I would have thought this a fairly vanilla install too... James