Hi there,
Has anyone constructed Spam Assassin rules that can match Cyrillic characters?
I know this is more of a RegEx question, but I have been very unsuccessful at finding out how to match Cyrillic characters in Spam Assassin rules.
<http://www.homes.com/>
Can anyone offer a little advice or point me to the appropriate method? These Russian spams are the only group I've been unable to stop.
This is impossible in an unpatched SpamAssassin. First, there're at least three encodings for Russian language: cp1251, koi8-r and utf-8. Second, SA treats messages as raw bytes, so Unicode properties like \p{Cyrillic} will not work.
I have a patch for SA that converts messages to Unicode and allows using Unicode regexps in rules:
http://bugzilla.spamassassin.org/show_bug.cgi?id=3244
Warning: this patch is untested and even I don't use it. It's just a proof of concept.
For Russian spam you can also use ok_languages, ok_locales options.