On Sun, 20 Sep 2015, AK wrote:
Hi all.
I'm getting hit with lots of JUNK mail that has multiple lines with just a
'.' on several lines [0]. Most of the JUNK email has at least 5 and at most
10 lines (so far) with just this '.' character somewhere in the middle of the
message.
I've copied the message source to RegexBuddy [1] and have been able to come
up with a regex that matches what I want using the Perl 5.20 engine:
(^\.\n){5,}
However, adding this rule to /etc/spamassassin/local.cf doesn't hit at all
when I run it against my test message as follows:
===== Start Rule Block =====
rawbody __MANY_PERIODS_1 ALL =~ /(^\.\n){5,}/
meta MANY_PERIODS __MANY_PERIODS_1
score MANY_PERIODS 2.0
describe MANY_PERIODS JUNK mail with several lines that contain single dot
===== End Rule Block =====
===== Begin Test Command =====
spamassassin -L -t test.msg
===== End Test Command =====
Please help me understand what I'm doing wrong as this is my first attempt at
creating a rule. Previously I've just copied and pasted what I've found here
in the forums, but this time I'm trying to do it myself but failing.
Regards,
ak.
SA does some interesting pre-processing on mail messages before applying
rules, so you need to understand that.
Try this:
rawbody T__LOCAL_MANY_PERIODS /\n(?:\.\n){5}?/
describe T__LOCAL_MANY_PERIODS Many lines with just a single "dot"
Notes:
1) Due to SA pre-processing collapsing body into one long line, cannot
match on '^' repeatedly, need to look for '\n' as line break indicator.
Find start of a line and then following repeats of ".\n"
2) use '(?:' as grouping optimization unless you care about capture.
3) for terminal match clause use '{5}' not '{5,}' as we're done as soon
as we see at least 5 matches, don't care if there are more.
4) use "non-greedy" match quantifier '}?' look for first hit on that
pattern and don't try to go for more.
Un-optimised pattern: /\n(\.\n){5}/
Note use of "testing" rule name format, that "T_". remove the leading 'T'
to make it into a silent rule for combining with metas.
Personal convention; I interpolate '_LOCAL_' ( or '_L_') in locally
created rule names to distinguish them for debugging. And then when things
don't work as expected (EG: FPs) it helps to determine if the problem is
self-inflicted.
Final note; now that we've discussed this spam sign, it will probably
become useless as spammers follow this list and mutate their crap
accordingly to dodge our rules. ;(
--
Dave Funk University of Iowa
<dbfunk (at) engineering.uiowa.edu> College of Engineering
319/335-5751 FAX: 319/384-0549 1256 Seamans Center
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{