At 10:41 AM -0700 08/09/2013, John Hardin wrote:
Can you provide a spample or two?
Looks like a similar spam method has come out in recent weeks (since
Jul 30, it seems) that uses slightly different footers... example is
here:
http://pastebin.com/QCmSPzwG
Although running SA on this spam _NOW_ yields a high score beyond the
spam threshold, this is almost entirely because additional network
tests are now hitting (extra RBLs + Razor). This was not the case
when the spam was first processed... looks like I was one of the
earlier recipients.
For this type, looks like a good match would be on the combo of
"/land/" + "/unsub/" + "/report/" ... I have modified my rule from
yesterday as follows:
# Spammy URI patterns
uri __OUTL_URI /\/outl\b/
uri __OUTI_URI /\/outi\b/
uri __LAND_URI /\/land\//
uri __UNSUB_URI /\/unsub\//
uri __REPORT_URI /\/report\//
meta SPAMMY_URI_PATTERNS ((__OUTL_URI && __OUTI_URI) ||
(__LAND_URI && __UNSUB_URI && __REPORT_URI))
describe SPAMMY_URI_PATTERNS link combos match highly spammy template
score SPAMMY_URI_PATTERNS 3
This modification hits both types of templates. I will very likely
be adding further "spammy patterns" to this rule over time. I'll
keep the list posted if I find some other good ones.
It looks like both this and the previous type of spam are bypassing
Bayes by embedding images and using no rendered text. Well, not NO
text, but very little, mostly a "successful delivery" message and the
unsub/report links. That is, Bayes sees absolutely no "spammy" text,
just the image which it cannot decode as spammy.
Are there any rules which can hit on "only embedded images with very
little text" ?? Not entirely sure how to capture this since it's
difficult to determine what is "not much" text and there is certainly
the potential for FPs that way (for example, anyone in the design
field sending images to clients without much text, etc.)...
But, these types of spams are bypassing SA consistently, to the tune
of tens per day per user. I would really love a way to stop them
besides hardcoding a rule based on their link syntax, which can be
easily changed during the next iteration of their spam template.
(The HTML comment gibberish rule would be a big step here, since
that's one of the few things that would distinguish this from ham...
unlikely that a real person would embed tens of KB of comment
gibberish.)
Thanks.
--- Amir