2007/7/7, Tom Lane <[EMAIL PROTECTED]>:
"Andriy Rysin" <[EMAIL PROTECTED]> writes: > escaping specials in regular expressions \m and \M for beginning of word and > end of word work for latin symbols bug don't for cyrillic Sorry, the locale-specific regex features only work on single-byte characters at the moment. In any case you'd need to be using a Russian locale (maybe you are, but you didn't say). I'd expect this feature to work with Cyrillic letters in ru_RU locale + KOI8 encoding, but not elsewhere.
Hi Tom, I was using en_US.UTF-8 locale but you're right even if I create my cluster with uk_UA.UTF-8 still \m would not work for cyrillic but would continue to work for latin chars. I can't work with single-byte encodings as I have some symbols from Unicode in my project and everything else is in Unicode so converting data forth and back would be quite a drag. So currently my only workaround for \m is to use (^|[^[:alpha:]]) though [:alpha:] even in uk_UA.UTF-8 means latin character, thus I have to specify symbols directly, e.g. (^|[^а-яієїґ]) which may be good if I don't care to separate Russian and Ukrainian but if I do I'd have to be even more specific for pure Ukrainian: (^|[^а-ьюяієїґ]) (assuming I remember about case-sensitivity of my regexp and assuming I know UTF-8 codes). Though I agree I missed the fact that \m is locale-specific (as it has to know proper non-word and word chars for locale) and thus can't work for all locales even if using Unicode and my original test in en_US locale was not valid, it still would be nice to have two things: 1) multibyte support for locale-specific regexps like \m and [:alpha:] 2) be able to tell regexp which LC_CTYPE to use for specific invocation at lest on SQL-statement level, this would be extremely useful for multi-lingual projects, e.g. dictionaries (which is the type of my project BTW), hopefully they are not to tightly connected to LC_CTYPE of the cluster. I understand though that these two not quite just bug fixes and will require some effort to implement. Thanks, Andriy