Kent Tong <[EMAIL PROTECTED]> writes: > You mean the OS fails to convert unicode strings to Big5 or the > OS assumes the bytes are already in Big5?
The latter. > It is the locale used for initdb or the default system locale > set in Windows that is used by the collation routines that you > mentioned above? The former. The real problem here, IMHO, is that Postgres allows you to select a "database encoding" setting that is different from the encoding implied by the initdb locale (ie, the LC_CTYPE setting). If you make this mistake, PG will carefully store data byte sequences in the specified "database encoding" ... and then pass them to strcoll() for comparison ... and strcoll() will assume that the data is in the encoding associated with LC_CTYPE. This is partially bad design on our part (we should really not have invented a per-database encoding selection when the locale setting is not per-database) and partially bad design on the part of the C standard (which doesn't provide any very sane way to find out what encoding is implied by an LC_CTYPE setting). I think the only real fix is to abandon the C library's locale routines and find or write our own library with a better API. This has been on the TODO list for a long time but no one's quite wished to face up to doing it ... In the meantime, make sure your encoding setting agrees with the LC_CTYPE value that initdb used. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster