[HACKERS] regular expressions stranges

Teodor Sigaev Tue, 23 Jan 2007 04:54:04 -0800

Regexp works differently with no-ascii characters depending on server encoding(bug.sql contains non-ascii char):


% initdb -E KOI8-R --locale ru_RU.KOI8-R
% psql postgres < bug.sql
 true
------
 t
(1 row)


 true | true
------+------
 t    | t
(1 row)
% initdb -E UTF8 --locale ru_RU.UTF-8
% psql postgres < bug.sql
 true
------
 f
(1 row)

 true | true
------+------
 f    | t
(1 row)

As I can see, that is because of using isalpha (and other is*), tolower &toupper instead of isw* and tow* functions. Is any reason to use them? If not, Ican modify regc_locale.c similarly to tsearch2 locale part.




--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

set client_encoding='KOI8';

SELECT  'Ä' ~* '[[:alpha:]]' as "true";
SELECT 
                'äÏÒÏÇÁ' ~* 'ÄÏÒÏÇÁ' as "true", 
                'ÄÏÒÏÇÁ' ~* 'ÄÏÒÏÇÁ' as "true";

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

[HACKERS] regular expressions stranges

Reply via email to