Re: TM format can mix encodings in to_char()

2019-06-29 Thread Tom Lane
Juanjo Santamaria Flecha writes: > It looks as if no work is left for this patch, so maybe updating the author > to Tom Lane (I'm just a repoter at this point, which it's fine) and the > status to ready for committer would better reflect its current status. Does > anyone think otherwise? Yeah,

Re: TM format can mix encodings in to_char()

2019-06-28 Thread Juanjo Santamaria Flecha
It looks as if no work is left for this patch, so maybe updating the author to Tom Lane (I'm just a repoter at this point, which it's fine) and the status to ready for committer would better reflect its current status. Does anyone think otherwise? Regards, Juan José Santamaría Flecha

Re: TM format can mix encodings in to_char()

2019-04-23 Thread Tom Lane
Andrew Dunstan writes: > Test above works as expected with the patch, see attached.  This is from > jacana. Great, thanks for checking! > LMK if you want more tests run before I blow the test instance away Can't think of anything else. It'd be nice if we could cover stuff like this in the regr

Re: TM format can mix encodings in to_char()

2019-04-23 Thread Andrew Dunstan
On 4/21/19 10:21 AM, Tom Lane wrote: > Andrew Dunstan writes: >> On 4/21/19 12:28 AM, Tom Lane wrote: >>> I don't have any way to test this on Windows, so could somebody >>> do that? Manually running the Turkish test cases ought to be enough. >> How does one do that? Just set a Turkish locale? >

Re: TM format can mix encodings in to_char()

2019-04-22 Thread Juan José Santamaría Flecha
Actually, I tried to show my findings with the tr_TR regression test, but you can reproduce the same issue with other locales and non-ASCII characters, as Tom has pointed out. For exampe: de_DE ISO-8859-1: March es_ES ISO-8859-1: Wednesday fr_FR ISO-8859-1: February Regards, Juan José Santamarí

Re: TM format can mix encodings in to_char()

2019-04-22 Thread Tom Lane
Peter Geoghegan writes: > On Sun, Apr 21, 2019 at 6:26 AM Andrew Dunstan > wrote: >> How does one do that? Just set a Turkish locale? > tr_TR is, in a sense, special among locales: > http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html > The Turkish dotless i has apparently

Re: TM format can mix encodings in to_char()

2019-04-22 Thread Peter Geoghegan
On Sun, Apr 21, 2019 at 6:26 AM Andrew Dunstan wrote: > How does one do that? Just set a Turkish locale? tr_TR is, in a sense, special among locales: http://blog.thetaphi.de/2012/07/default-locales-default-charsets-and.html The Turkish dotless i has apparently been implicated in all kinds of bu

Re: TM format can mix encodings in to_char()

2019-04-21 Thread Tom Lane
Andrew Dunstan writes: > On 4/21/19 12:28 AM, Tom Lane wrote: >> I don't have any way to test this on Windows, so could somebody >> do that? Manually running the Turkish test cases ought to be enough. > How does one do that? Just set a Turkish locale? Try variants of the original test case. Fo

Re: TM format can mix encodings in to_char()

2019-04-21 Thread Andrew Dunstan
On 4/21/19 12:28 AM, Tom Lane wrote: > I wrote: >> [ fix-encoding-and-error-recovery-in-cache-locale-time.patch ] > On closer inspection, I'm pretty sure either version of this patch > will break things on Windows, because that platform already had code > to convert the result of wcsftime() to th

Re: TM format can mix encodings in to_char()

2019-04-20 Thread Tom Lane
I wrote: > [ fix-encoding-and-error-recovery-in-cache-locale-time.patch ] On closer inspection, I'm pretty sure either version of this patch will break things on Windows, because that platform already had code to convert the result of wcsftime() to the database encoding; we were adding code to do

Re: TM format can mix encodings in to_char()

2019-04-20 Thread Tom Lane
I wrote: > This is a little bit off, now that I look at it, because it's > failing to account for the possibility of getting -1 from > pg_get_encoding_from_locale. It should probably do what > pg_bind_textdomain_codeset does: > if (encoding < 0) > encoding = PG_SQL_ASCII; Actu

Re: TM format can mix encodings in to_char()

2019-04-20 Thread Tom Lane
I wrote: > Hmm. I'd always imagined that the way that libc works is that LC_CTYPE > determines the encoding (codeset) it's using across the board, so that > functions like strftime would deliver data in that encoding. > [ and much more based on that ] After further study of the code, the situatio

Re: TM format can mix encodings in to_char()

2019-04-19 Thread Tom Lane
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= writes: > The problem is that the locale 'tr_TR' uses the encoding ISO-8859-9 (LATIN5), > while the test runs in UTF8. So the following code will raise an error: > SET lc_time TO 'tr_TR'; > SELECT to_char(date '2010-02-01', 'DD TMMON '); > ER

Re: TM format can mix encodings in to_char()

2019-04-19 Thread Kyotaro HORIGUCHI
Hello. At Fri, 12 Apr 2019 18:45:51 +0200, Juan José Santamaría Flecha wrote in > Hackers, > > I will use as an example the code in the regression test > 'collate.linux.utf8'. > There you can find: > > SET lc_time TO 'tr_TR'; > SELECT to_char(date '2010-04-01', 'DD TMMON '); >to_char