Hello folks, my system is a SuSE 10.0 Linux and a plain PostgreSQL 8.1.2 (compiled by myself, NLS enabled). LOCALE is set to de_DE.UTF-8.
The bug shows up using the operator '~*' with umlauts. An easy way to produce a faulty result is select 'XXXMÜLLERYyyy' ~* '.*müller.*'; The result should be "TRUE", however Postgres thinks, it's "FALSE" (see also discussion in www.pg-forum.de, subject "Konfiguration", thread "Umlaute bei Regular Expressions"). It seems that this problem does not exist in Windows based installations. It seems to me that this bug is originated in the file src/backend/regex/regc_locale.c. The functions pg_wc_tolower(pg_wchar) and pg_wc_toupper(pg_wchar) rely on the C-functions toupper(unsigned char) and tolower(unsigned char) which definitely are the wrong choice for UTF8 characters beyond the ASCII coding. To check my estimation, I replaced the bodies of pg_wc_tolower and pg_wc_toupper simply by "return towlower(c);" and "return towupper(c);", which lead to the correct results of select 'XXXMÜLLERYyyy' ~* '.*müller.*'; Since I don't have any idea concerning the side effects of this change, please let me know as soon as an "official" patch is available - I definitely do need regular expressions handling UTF8 correctly... Thanks, Helmar Spangenberg e-mail: [EMAIL PROTECTED] ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend