Re: TM format can mix encodings in to_char()

Kyotaro HORIGUCHI Fri, 19 Apr 2019 01:31:05 -0700

Hello.

At Fri, 12 Apr 2019 18:45:51 +0200, Juan José Santamaría Flecha 
<juanjo.santama...@gmail.com> wrote in 
<cac+axb22so5azm2vze+mchyxec7gwfr-n-sk-io091r0p_1...@mail.gmail.com>
> Hackers,
> 
> I will use as an example the code in the regression test
> 'collate.linux.utf8'.
> There you can find:
> 
> SET lc_time TO 'tr_TR';
> SELECT to_char(date '2010-04-01', 'DD TMMON YYYY');
>    to_char
> -------------
>  01 NIS 2010
> (1 row)
> 
> The problem is that the locale 'tr_TR' uses the encoding ISO-8859-9
> (LATIN5),
> while the test runs in UTF8. So the following code will raise an error:
> 
> SET lc_time TO 'tr_TR';
> SELECT to_char(date '2010-02-01', 'DD TMMON YYYY');
> ERROR:  invalid byte sequence for encoding "UTF8": 0xde 0x75


The same case is handled for lc_numeric. lc_time ought to be
treated the same way.

> The problem seems to be in the code touched in the attached patch.

It seems basically correct, but cache_single_time does extra
strdup when pg_any_to_server did conversion. Maybe it would be
better be like this:

> oldcxt = MemoryContextSwitchTo(TopMemoryContext);
> ptr = pg_any_to_server(buf, strlen(buf), encoding);
> 
> if (ptr == buf)
> {
>       /* Conversion didn't pstrdup, so we must */
>       ptr = pstrdup(buf);
> }
> MemoryContextSwitchTo(oldcxt);

-       int                     i;
+       int                     i,
+                               encoding;

It is not strictly kept, but (I believe) we don't define multiple
variables in a single definition.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: TM format can mix encodings in to_char()

Reply via email to