Hi all,

        So, one of my users has been getting dozens of spams per day lately, 
that have been getting BAYES_999 but not triggering any other point rules.  All 
of these spams have forge warnings in the Received header, and it seems like it 
might be worth adding a low-scoring "may be forged" rule... what do people 
think?  (Apparently HTML_FONT_LOW_CONTRAST is only a placeholder when network 
tests are enabled?  Not sure why, seems like it should be useful at all times.)

Two spamples:

http://pastebin.com/1AhK1DiU
http://pastebin.com/LVRy5Bu6

My user is apparently getting the first runs, before these servers have gotten 
onto the DNSBLs.  Subsequent duplicate spams were properly caught by SA after 
the DNSBLs caught up, but the first waves get through.

Normally, I would hope that my URI templates would catch these.  However, 
although the first one would be caught by one of my AC_SPAMMY_URI_PATTERNS 
subrules (specifically, __AC_STOPRANDDOM_URI, though not sure if this ever made 
it to distribution or not) now that I've added the new TLDs to that rule, the 
second spample will specifically bypass this rule because of the extra 
subdirectory.  I could start adding additional subdirectory checks to that 
rule, but it's already so ungainly when checking for the domains literally, and 
I'm worried about FPs if checking for "any" subdirectory.  (Yes, I could just 
blacklist those TLDs, but I'd prefer avoiding the nuclear option unless that's 
the only option.)

(1) The only pattern I've noticed that's common to all of these spams is that 
the Received line includes a "(may be forged)" warning.  SA doesn't seem to 
have a rule for this.  Would it be worthwhile adding something like 0.1 or 0.2 
points for such warnings?  Although I know ham can also include this warning, 
such a low score normally wouldn't cause FPs, but when combined with BAYES_99 + 
BAYES_999, should be enough to push it over the edge.  Does anyone see any 
potential downside to this?  Would people be interested in making such a rule 
standard?

(2) Does anyone have any better rules that might hit on these spammy templates? 
 So far, besides the URI scheme, I haven't really found good patterns, except 
for the forge warning in the Received header.  There's a lot of the "repeated 
numerical identifier" in the random "Bayes-poison" text at the bottom (yes, I 
know it doesn't really work well as Bayes poison, I just can't think of another 
name for it), but SA doesn't have a good way of checking for that kind of 
pattern.  Some of them use a lot of # or = or - or whatever to create line 
breaks, but that's also not uniform and could be fodder for FPs. Has anyone 
identified any other decent template identifiers against which to write rules?

(3) Should HTML_FONT_LOW_CONTRAST become a higher-scoring rule when network 
tests are enabled?  It only scores 0.001 now, maybe that should be upped to 0.1 
instead?  Wondering why it's a placeholder now, and scores only without network 
tests.

Thanks.

--- Amir

Reply via email to