On 18/05/12 03:18, David F. Skoll wrote:
I looked at the regex and it seems that Perl treats är as having a
word boundary in the \b sense between the "ä" and the "r"
On 18.05.12 07:26, Jason Haar wrote:
A bit OT, but is it because your perl is running under "C" locale
instead of se? i.e. would the word boundary definition change under
different localization contexts? Doesn't help solve the problem for you,
but it certainly flags a potential issue with a tonne of the rules in SA...
sa would need to switch to correct locale before processing of the
e-mail to avoid this error. Setting the correct locale could be
different for different users and even for different mails.
I'm not sure if this is a way to go, although there may be single cases
where it helps.
I'm more in favor of advanced processing, watching different languages
and/or comparing matching strings for words in different languages,
e.g. FRT_SOMA misfiring for word "somar" (donkey), FRT_PENIS1 for
"penize" (money), FUZZY_CREDIT for "kredit" (credit) etc.
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Remember half the people you know are below average.