On Mon, Oct 15, 2007 at 11:09:54AM +0200, Magnus Hagander wrote: > On Sat, Oct 06, 2007 at 01:53:31PM -0400, Tom Lane wrote: > > I am thinking that Dave's discovery explains some previously unsolved > > bug reports, such as > > http://archives.postgresql.org/pgsql-bugs/2007-05/msg00260.php > > If Windows returns LC_CTYPE=C in a situation like this, then > > the various single-byte-charset optimization paths that are enabled by > > lc_ctype_is_c() would be mistakenly used, leading to misbehavior in > > upper()/lower() and other places. ISTM we had better hack > > lc_ctype_is_c() so that on Windows (only), if the database encoding > > is UTF-8 then it returns FALSE regardless of what setlocale says. > > Yes, I think we a change to that routine. > > But. What about the case when we actually *have* locale=C and > encoding=UTF8. We need to care for that one somehow. Perhaps we should look > at LC_COLLATE instead (again, on Windows only. Possibly even only in the > windows+locale_returns_c+encoring=utf8 case, to distinguish these two)?
Hmm. Looking more at that, may there be another problem? Looking at WriteControlFile(), it writes out what setlocale(LC_CTYPE) returns, which will then be "C" - even if the database isn't in C. But I don't really know when that code is called, or if I'm just looking at things wrong. Just starting up and shutting down the database leaves it at Swedish_Sweden.1252, not C. (1252 is still the wrong encoding specifyer, but it'll work anyway since we convert to UTF16) Now, I came across this trying to find a way for lc_ctype_is_c() to determine if the database is in C locale or not, *without* resorting to setlocale(). Any pointers on how to do that properly? Also, any pointers on a way to check for the kind of failure that's to be expected from this one returning the wrong thing? > > One bright spot is that this does seem to suggest a way to implement the > > recommendation I made in the -patches thread: if we can't support the > > encoding (codepage) used by the locale seen by initdb, we could try > > stripping the codepage indicator (if any) and plastering on .65001 > > to get a UTF8-compatible locale name. That'd only work on Windows > > but that seems the platform where we're most likely to see unsupportable > > default encodings. > > Um, yes, that should work - assuming encoding is set to UTF8. We can't do > that for any other encoding, of course. Looking at that, doesn't actually need to put that at the end of the locale-name - all locale names will work with UTF8, even one specifying 1252. Attached patch seems to work for me for that part. Still doesn't touch lc_ctype_is_c(). //Magnus
Index: backend/commands/dbcommands.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/commands/dbcommands.c,v retrieving revision 1.201 diff -c -r1.201 dbcommands.c *** backend/commands/dbcommands.c 13 Oct 2007 20:18:41 -0000 1.201 --- backend/commands/dbcommands.c 15 Oct 2007 10:55:20 -0000 *************** *** 258,264 **** /* * Check whether encoding matches server locale settings. We allow ! * mismatch in two cases: * * 1. ctype_encoding = SQL_ASCII, which means either that the locale * is C/POSIX which works with any encoding, or that we couldn't determine --- 258,264 ---- /* * Check whether encoding matches server locale settings. We allow ! * mismatch in three cases: * * 1. ctype_encoding = SQL_ASCII, which means either that the locale * is C/POSIX which works with any encoding, or that we couldn't determine *************** *** 268,279 **** --- 268,286 ---- * This is risky but we have historically allowed it --- notably, the * regression tests require it. * + * 3. selected encoding is UTF8 and platform is win32. This is because + * UTF8 is a pseudo codepage that is supported in all locales since + * it's converted to UTF16 before being used. + * * Note: if you change this policy, fix initdb to match. */ ctype_encoding = pg_get_encoding_from_locale(NULL); if (!(ctype_encoding == encoding || ctype_encoding == PG_SQL_ASCII || + #ifdef WIN32 + encoding == PG_UTF8 || + #endif (encoding == PG_SQL_ASCII && superuser()))) ereport(ERROR, (errmsg("encoding %s does not match server's locale %s", Index: bin/initdb/initdb.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/bin/initdb/initdb.c,v retrieving revision 1.145 diff -c -r1.145 initdb.c *** bin/initdb/initdb.c 13 Oct 2007 20:18:41 -0000 1.145 --- bin/initdb/initdb.c 15 Oct 2007 10:50:27 -0000 *************** *** 2840,2846 **** /* We allow selection of SQL_ASCII --- see notes in createdb() */ if (!(ctype_enc == user_enc || ctype_enc == PG_SQL_ASCII || ! user_enc == PG_SQL_ASCII)) { fprintf(stderr, _("%s: encoding mismatch\n"), progname); fprintf(stderr, --- 2840,2856 ---- /* We allow selection of SQL_ASCII --- see notes in createdb() */ if (!(ctype_enc == user_enc || ctype_enc == PG_SQL_ASCII || ! user_enc == PG_SQL_ASCII ! #ifdef WIN32 ! /* ! * On win32, if the encoding chosen is UTF8, all locales are OK ! * (assuming the actual locale name passed the checks above). This ! * is because UTF8 is a pseudo-codepage, that we convert to UTF16 ! * before doing any operations on, and UTF16 supports all locales. ! */ ! || user_enc == PG_UTF8 ! #endif ! )) { fprintf(stderr, _("%s: encoding mismatch\n"), progname); fprintf(stderr,
---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org