Hi all, So, one of my users has been getting dozens of spams per day lately, that have been getting BAYES_999 but not triggering any other point rules. All of these spams have forge warnings in the Received header, and it seems like it might be worth adding a low-scoring "may be forged" rule... what do people think? (Apparently HTML_FONT_LOW_CONTRAST is only a placeholder when network tests are enabled? Not sure why, seems like it should be useful at all times.)
Two spamples: http://pastebin.com/1AhK1DiU http://pastebin.com/LVRy5Bu6 My user is apparently getting the first runs, before these servers have gotten onto the DNSBLs. Subsequent duplicate spams were properly caught by SA after the DNSBLs caught up, but the first waves get through. Normally, I would hope that my URI templates would catch these. However, although the first one would be caught by one of my AC_SPAMMY_URI_PATTERNS subrules (specifically, __AC_STOPRANDDOM_URI, though not sure if this ever made it to distribution or not) now that I've added the new TLDs to that rule, the second spample will specifically bypass this rule because of the extra subdirectory. I could start adding additional subdirectory checks to that rule, but it's already so ungainly when checking for the domains literally, and I'm worried about FPs if checking for "any" subdirectory. (Yes, I could just blacklist those TLDs, but I'd prefer avoiding the nuclear option unless that's the only option.) (1) The only pattern I've noticed that's common to all of these spams is that the Received line includes a "(may be forged)" warning. SA doesn't seem to have a rule for this. Would it be worthwhile adding something like 0.1 or 0.2 points for such warnings? Although I know ham can also include this warning, such a low score normally wouldn't cause FPs, but when combined with BAYES_99 + BAYES_999, should be enough to push it over the edge. Does anyone see any potential downside to this? Would people be interested in making such a rule standard? (2) Does anyone have any better rules that might hit on these spammy templates? So far, besides the URI scheme, I haven't really found good patterns, except for the forge warning in the Received header. There's a lot of the "repeated numerical identifier" in the random "Bayes-poison" text at the bottom (yes, I know it doesn't really work well as Bayes poison, I just can't think of another name for it), but SA doesn't have a good way of checking for that kind of pattern. Some of them use a lot of # or = or - or whatever to create line breaks, but that's also not uniform and could be fodder for FPs. Has anyone identified any other decent template identifiers against which to write rules? (3) Should HTML_FONT_LOW_CONTRAST become a higher-scoring rule when network tests are enabled? It only scores 0.001 now, maybe that should be upped to 0.1 instead? Wondering why it's a placeholder now, and scores only without network tests. Thanks. --- Amir