This snowshoe stuff has been a PITA for a while.

For most of my users (particularly the Geeks), it's not even on their
radar.

For others, (inluding my most complex domain), 80% of their FNs are
from snowshoers.

As well as the usual battery of anti-spam tests,
I'm using a layered/meta approach of tests:
    1. "teaser" header word checks (see below)
    2. sender IP checking against large hosts that have been known
       to host snowshoers (hand-maintained)
    3. unsubscribe phrase(s) in the body
    4. Barracuda

If you look at several snowshoe samples, you'll note that the "From"
and/or "Subject" pretty much ALWAYS contain some sort of "teaser"
word(s).

Those are the two headers that are (always?) displayed to the potential
victim, so the spammer has a strong incentive to continue using those
to try to lure in the victim.  They're a VERY good target for new
rules.

I've broken these "teasers" down into three general groups (and score
accordingly):
    A. specific product names (e.g. "pedi paws") which are
       high-quality/low-risk spam signs
    B. generic product names (e.g. "green tea") which are
       medium-quality/medium-risk spam signs
    C. general terms (e.g. many variations on "insurance") which are
       medium-quality/higher-risk spam signs

I've never had an FP on the first group, and they're really easy to spot
and add to my rules.  I've even begun pre-emptively listing anything
I notice while watching TV.  The Weather Channel is particularly useful
for that. :)

The last group is the tricky one, and pretty much has to be used in
metas with the other rule groups listed above.

I regularly update my list of "active" snowshoe IP ranges, which catches
most of these.  That's my single most time intensive non-coding task, in
all of my anti-spam work.  I've gotten to the point where, if I notice
more than a few /24s in any one webhost's IP space, I re-classify _ALL_
of their blocks with a generally non-scoring code, then use that as a
meta at run-time.  The main problem is that I need more data to expand
these.

Anything which is sent from any of those IP blocks, then gets a HUGE
bonus if there is either a weak "teaser" and/or an unsubscribe term in
the body.

I'm planning to add another meta bonus rule for anything that's on
Barracuda.

I've found that HostKarma's blocklist is about as efficacious as
Barracuda, however I've experienced some timeouts, and some hinky
whitelist results, so I'm only using it in my FP pipeline, where it has
been extremely useful (Mark, if you're reading this, I'd be very happy
to send you more details and any specific data that would be helpful to
you - feel free to contact me off-list).

Some snowshoers have started putting the unsub link in a GIF, so I'll be
adding some rules for that, soonish.



*** Rob McEwen: ***
Would you be willing to provide your /24 list, for even a short period,
in some sort of plain text format (maybe one CIDR per line?), so those
of us with good hand-classified corpi could try out your data?

Most of my users are in a shared hosting environment, so they can't use
your list suite as-is.  Based on what reliable people have posted, some
of my users should probably benefit from your /24 list.  I'd be very
glad to provide you with a list of any FPs I find. :)

Contact me off-list, if you'd prefer.



Reply via email to