On Mon, Oct 15, 2007 at 11:09:54AM +0200, Magnus Hagander wrote:
> On Sat, Oct 06, 2007 at 01:53:31PM -0400, Tom Lane wrote:
> > I am thinking that Dave's discovery explains some previously unsolved
> > bug reports, such as
> > http://archives.postgresql.org/pgsql-bugs/2007-05/msg00260.php
> > If Windows returns LC_CTYPE=C in a situation like this, then
> > the various single-byte-charset optimization paths that are enabled by
> > lc_ctype_is_c() would be mistakenly used, leading to misbehavior in
> > upper()/lower() and other places.  ISTM we had better hack
> > lc_ctype_is_c() so that on Windows (only), if the database encoding
> > is UTF-8 then it returns FALSE regardless of what setlocale says.
> 
> Yes, I think we a change to that routine.
> 
> But. What about the case when we actually *have* locale=C and
> encoding=UTF8. We need to care for that one somehow. Perhaps we should look
> at LC_COLLATE instead (again, on Windows only. Possibly even only in the
> windows+locale_returns_c+encoring=utf8 case, to distinguish these two)?

Hmm. Looking more at that, may there be another problem? Looking at
WriteControlFile(), it writes out what setlocale(LC_CTYPE) returns, which
will then be "C" - even if the database isn't in C.

But I don't really know when that code is called, or if I'm just looking at
things wrong. Just starting up and shutting down the database leaves it at
Swedish_Sweden.1252, not C.
(1252 is still the wrong encoding specifyer, but it'll work anyway since we
convert to UTF16)

Now, I came across this trying to find a way for lc_ctype_is_c() to
determine if the database is in C locale or not, *without* resorting to
setlocale(). Any pointers on how to do that properly?

Also, any pointers on a way to check for the kind of failure that's to be
expected from this one returning the wrong thing?


> > One bright spot is that this does seem to suggest a way to implement the
> > recommendation I made in the -patches thread: if we can't support the
> > encoding (codepage) used by the locale seen by initdb, we could try
> > stripping the codepage indicator (if any) and plastering on .65001
> > to get a UTF8-compatible locale name.  That'd only work on Windows
> > but that seems the platform where we're most likely to see unsupportable
> > default encodings.
> 
> Um, yes, that should work - assuming encoding is set to UTF8. We can't do
> that for any other encoding, of course.

Looking at that, doesn't actually need to put that at the end of the
locale-name - all locale names will work with UTF8, even one specifying
1252.

Attached patch seems to work for me for that part. Still doesn't touch
lc_ctype_is_c().

//Magnus
Index: backend/commands/dbcommands.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.201
diff -c -r1.201 dbcommands.c
*** backend/commands/dbcommands.c       13 Oct 2007 20:18:41 -0000      1.201
--- backend/commands/dbcommands.c       15 Oct 2007 10:55:20 -0000
***************
*** 258,264 ****
  
        /*
         * Check whether encoding matches server locale settings.  We allow
!        * mismatch in two cases:
         *
         * 1. ctype_encoding = SQL_ASCII, which means either that the locale
         * is C/POSIX which works with any encoding, or that we couldn't 
determine
--- 258,264 ----

        /*
         * Check whether encoding matches server locale settings.  We allow
!        * mismatch in three cases:
         *
         * 1. ctype_encoding = SQL_ASCII, which means either that the locale
         * is C/POSIX which works with any encoding, or that we couldn't 
determine
***************
*** 268,279 ****
--- 268,286 ----
         * This is risky but we have historically allowed it --- notably, the
         * regression tests require it.
         *
+        * 3. selected encoding is UTF8 and platform is win32. This is because
+        * UTF8 is a pseudo codepage that is supported in all locales since
+        * it's converted to UTF16 before being used.
+        *
         * Note: if you change this policy, fix initdb to match.
         */
        ctype_encoding = pg_get_encoding_from_locale(NULL);
  
        if (!(ctype_encoding == encoding ||
                  ctype_encoding == PG_SQL_ASCII ||
+ #ifdef WIN32
+                 encoding == PG_UTF8 ||
+ #endif
                  (encoding == PG_SQL_ASCII && superuser())))
                ereport(ERROR,
                                (errmsg("encoding %s does not match server's 
locale %s",
Index: bin/initdb/initdb.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v
retrieving revision 1.145
diff -c -r1.145 initdb.c
*** bin/initdb/initdb.c 13 Oct 2007 20:18:41 -0000      1.145
--- bin/initdb/initdb.c 15 Oct 2007 10:50:27 -0000
***************
*** 2840,2846 ****
                /* We allow selection of SQL_ASCII --- see notes in createdb() 
*/
                if (!(ctype_enc == user_enc ||
                          ctype_enc == PG_SQL_ASCII ||
!                         user_enc == PG_SQL_ASCII))
                {
                        fprintf(stderr, _("%s: encoding mismatch\n"), progname);
                        fprintf(stderr,
--- 2840,2856 ----
                /* We allow selection of SQL_ASCII --- see notes in createdb() 
*/
                if (!(ctype_enc == user_enc ||
                          ctype_enc == PG_SQL_ASCII ||
!                         user_enc == PG_SQL_ASCII
! #ifdef WIN32                    
!                       /*
!                        * On win32, if the encoding chosen is UTF8, all 
locales are OK 
!                        * (assuming the actual locale name passed the checks 
above). This
!                        * is because UTF8 is a pseudo-codepage, that we 
convert to UTF16
!                        * before doing any operations on, and UTF16 supports 
all locales.
!                        */
!                       || user_enc == PG_UTF8
! #endif
!                         ))
                {
                        fprintf(stderr, _("%s: encoding mismatch\n"), progname);
                        fprintf(stderr,
---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Reply via email to