Hi,
I am trying to diagnose why certain rules do not fire as expected on
beginning of lines. Here is a MWE e-mail
"""
From: f...@addr.com
To: t...@addr.com
Subject: email's subject
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
To: Aa
To: Bb
To: Cc
To: Dd
To: Ee
"""
Here are the rules to test:
# tests for SA list
body T_BODY_TO_NOMULTI /(^|\n|\r)To: \S/i #Should hit "To :
" at start of line
tflags T_BODY_TO_NOMULTI multiple maxhits=3
body T_BODY_TO_MULTI /(^|\n|\r)To: \S/im #I am unsure how
multiline interacts here
tflags T_BODY_TO_MULTI multiple maxhits=3
rawbody T_RAWBODY_TO_NOMULTI /(^|\n|\r)To: \S/i #I am
unsure how rawbody changes things
tflags T_RAWBODY_TO_NOMULTI multiple maxhits=3
rawbody T_RAWBODY_TO_MULTI /^To: \S/im
tflags T_RAWBODY_TO_MULTI multiple maxhits=3
body T_BODY_TO_NOCARET /To: \S/i
tflags T_BODY_TO_NOCARET multiple maxhits=3 # should (and
does) hit
The object of these rules is to detect the "To" in a body at the start
of a line as to check if this e-mail might be a reply. When run on the
e-mail above, I expected the following result for all rules:
To: A
To: B
To: C
Here are the actual results for all rules:
spamassassin -D 2>&1 < t2.eml | grep BODY_TO
jan 29 10:56:14.310 [11274] dbg: rules: ran body rule T_BODY_TO_NOMULTI
======> got hit: "To: A"
jan 29 10:56:14.310 [11274] dbg: rules: ran body rule T_BODY_TO_NOMULTI
======> got hit: "To: D" #should be B
jan 29 10:56:14.310 [11274] dbg: rules: ran body rule T_BODY_TO_NOMULTI
======> got hit: "To: E"
jan 29 10:56:14.335 [11274] dbg: rules: ran body rule T_BODY_TO_MULTI
======> got hit: "To: A"
jan 29 10:56:14.335 [11274] dbg: rules: ran body rule T_BODY_TO_MULTI
======> got hit: "To: D" #should be B
jan 29 10:56:14.335 [11274] dbg: rules: ran body rule T_BODY_TO_MULTI
======> got hit: "To: E"
jan 29 10:56:14.345 [11274] dbg: rules: ran body rule T_BODY_TO_NOCARET
======> got hit: "To: A"
jan 29 10:56:14.345 [11274] dbg: rules: ran body rule T_BODY_TO_NOCARET
======> got hit: "To: B" #as expected, but does not test beginning of line
jan 29 10:56:14.345 [11274] dbg: rules: ran body rule T_BODY_TO_NOCARET
======> got hit: "To: C"
jan 29 10:56:14.545 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_MULTI ======> got hit: "To: A"
jan 29 10:56:14.546 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_MULTI ======> got hit: "To: B" #as expected
jan 29 10:56:14.546 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_MULTI ======> got hit: "To: C"
jan 29 10:56:14.550 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_NOMULTI ======> got hit: " #The other (closing) quote is
just gone! where did it go? What is matched?
jan 29 10:56:14.550 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_NOMULTI ======> got hit: "
jan 29 10:56:14.550 [11274] dbg: rules: ran rawbody rule
T_RAWBODY_TO_NOMULTI ======> got hit: "
[...]
My main interrogation is why neither T_BODY_TO_NOMULTI or
T_BODY_TO_MULTI hits as expected. There appears to be some interaction
with the previous line that I do not understand. Am I interpreting
(^|\n|\r) incorrectly? Is there any reason to search for \n or \r
instead of ^? Is there a way to consider a newline with "body" instead
of "rawbody"?
Using:
SpamAssassin version 3.4.1
running on Perl version 5.14.2
on Ubuntu 12.04
Thanks in advance
-Olivier