Regexp works differently with no-ascii characters depending on server encoding
(bug.sql contains non-ascii char):
% initdb -E KOI8-R --locale ru_RU.KOI8-R
% psql postgres < bug.sql
true
------
t
(1 row)
true | true
------+------
t | t
(1 row)
% initdb -E UTF8 --locale ru_RU.UTF-8
% psql postgres < bug.sql
true
------
f
(1 row)
true | true
------+------
f | t
(1 row)
As I can see, that is because of using isalpha (and other is*), tolower &
toupper instead of isw* and tow* functions. Is any reason to use them? If not, I
can modify regc_locale.c similarly to tsearch2 locale part.
--
Teodor Sigaev E-mail: [EMAIL PROTECTED]
WWW: http://www.sigaev.ru/
set client_encoding='KOI8';
SELECT 'Ä' ~* '[[:alpha:]]' as "true";
SELECT
'äÏÒÏÇÁ' ~* 'ÄÏÒÏÇÁ' as "true",
'ÄÏÒÏÇÁ' ~* 'ÄÏÒÏÇÁ' as "true";
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend