Hello Gregory, hello all,
I've tested the DB created with hu_HU.UTF8 lc_* settings and it works like a charm! We'll re-create the database and move the contents. It's a nice improvement of 8.3 to detect and disallow such misconfiguration. ------ Summing up the issue & fix (for googling): - Version: postgresql 8.2.x (and possibly older ones as well) - Symptom: UTF-8 values are stored correctly but some ILIKE queries don't return results as expected, even when searching for ASCII substrings. - Reason: database misconfiguration -> UPPER()/LOWER() cannot handle UTF-8 chars -> ILIKE cannot handle UTF-8 chars. The error above occurs if the DB is created with UTF-8 internal encoding, but lc_* settings are not referring UTF-8 locales. (Eg. bad locale reference: "hu_HU", good locale reference: "hu_HU.UTF8".) - Fix: a new database (or DB cluster?) should be created with the corrected lc_* settings. If there's data already, it should be dumped and restored (see steps below). - Additional info: lc_* settings (and a whole lot of others) can be displayed with the psql command "SHOW ALL". - Additional info: in case the DB is configured incorrectly, psql 8.2 just fails to match rows correctly (without any errors), but 8.3+ will reject this bad configuration in time. ------ Thanks a lot, guys! Best regards, Gergely BOR On 10/25/07, Gregory Stark <[EMAIL PROTECTED]> wrote: > > "Gergely Bor" <[EMAIL PROTECTED]> writes: > > > We'll google the initdb stuff and try it ASAP. > > > > What I've tried is LOWER and UPPER, and they seem to return trash for > > Hungarian UTF-8 characters, but they handle ASCII well. (Hmmmm... > > maybe ILIKE requires LOWER and UPPER to work? Would not be > > illogical...) > > It does. I think it works by just downcasing both strings. It's possible to do > better but tricky. I think 8.3 has an optimization for that for single-byte > encodings but it had to be disabled for utf-8 in the end. > > If it's returning trash for those characters then it's not prepared to handle > UTF-8 data. You have to use an encoding compatible with your locale and > vice-versa. > > If you want to store UTF-8 data I suggest you > > . add hu_HU.UTF-8 to /etc/locale.gen, > . rerun /usr/sbin/locale-gen > . pg_dump your database > . re-initdb with the locale set to hu_HU.UTF-8 > . pg_restore your data. > > Unfortunately that'll take quite a while and involve down-time. > > You should probably do this in a second directory aside from your existing > database just in case you've created any invalidly encoded utf-8 strings. > You'll have to fix them before restoring. (Actually I don't recall which > version got strict about that.) > > -- > Gregory Stark > EnterpriseDB http://www.enterprisedb.com > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match