At 10:41 AM -0700 08/09/2013, John Hardin wrote:
Can you provide a spample or two?

Looks like a similar spam method has come out in recent weeks (since Jul 30, it seems) that uses slightly different footers... example is here:

http://pastebin.com/QCmSPzwG

Although running SA on this spam _NOW_ yields a high score beyond the spam threshold, this is almost entirely because additional network tests are now hitting (extra RBLs + Razor). This was not the case when the spam was first processed... looks like I was one of the earlier recipients.

For this type, looks like a good match would be on the combo of "/land/" + "/unsub/" + "/report/" ... I have modified my rule from yesterday as follows:

# Spammy URI patterns
uri __OUTL_URI  /\/outl\b/
uri __OUTI_URI  /\/outi\b/
uri __LAND_URI  /\/land\//
uri __UNSUB_URI /\/unsub\//
uri __REPORT_URI        /\/report\//
meta SPAMMY_URI_PATTERNS ((__OUTL_URI && __OUTI_URI) || (__LAND_URI && __UNSUB_URI && __REPORT_URI))
describe SPAMMY_URI_PATTERNS    link combos match highly spammy template
score SPAMMY_URI_PATTERNS       3

This modification hits both types of templates. I will very likely be adding further "spammy patterns" to this rule over time. I'll keep the list posted if I find some other good ones.


It looks like both this and the previous type of spam are bypassing Bayes by embedding images and using no rendered text. Well, not NO text, but very little, mostly a "successful delivery" message and the unsub/report links. That is, Bayes sees absolutely no "spammy" text, just the image which it cannot decode as spammy.

Are there any rules which can hit on "only embedded images with very little text" ?? Not entirely sure how to capture this since it's difficult to determine what is "not much" text and there is certainly the potential for FPs that way (for example, anyone in the design field sending images to clients without much text, etc.)...

But, these types of spams are bypassing SA consistently, to the tune of tens per day per user. I would really love a way to stop them besides hardcoding a rule based on their link syntax, which can be easily changed during the next iteration of their spam template.

(The HTML comment gibberish rule would be a big step here, since that's one of the few things that would distinguish this from ham... unlikely that a real person would embed tens of KB of comment gibberish.)

Thanks.

                                                --- Amir

Reply via email to