As you all know I'm in the spam blocking business and looking to share
my information with others to help them block spam for everyone. I'm
currently feeding my spam to several people now.
So - looking to expand this now that I feel like I'm not losing the spam
battle anymore. (Thanks to FuzzyOCR and other new tricks).
So - let me describe my setup. I actually do most of my spam filtering
with Exim rules. Using Exim I can identify a huge amount of both spam
and ham without having to use SA, which is expensive resource wise.
However SA is still very important to my setup as it gets whatever I
can't get using Exim rules.
I do front end filtering for about 3000 domains. Mail comes in, I clean
it, and forward it onto the destination server. In the process I reject
millions of spams a day. But what I'm doing is capturing some of the
spam and feeding it to others who provide blacklist services to everyone
else. This seems to be working well and I want to expand it.
What I have is several feeds depending on what kind of spam you are
looking for. One feed is mostly from virus infected zombies suitable for
blacklisting the server. Another feed is spam that I have determined
using SA that often comes from servers like gmail, yahoo, comcast and
hotmail. This feed isn't suitable for IP based blacklists but is good
for mining URI blacklists and message fingerprinting.
One think I'm doing is just bouncing the easy stuff. If the server is
already listed at spamhaus I don't see any reason to forward it. Much of
this spam is from servers not already listed on the other high quality
lists. So this is "new" spam. Perhaps the reciently infected or
exploited and not easilly trapped. The volume of spam is about 200,000
message per day.
I also enhance the headers storing the sending host's IP address in a
separate header for blacklist mining. There are also headers giving
detailed information as to why the message was classified as spam.
So - here's the deal. If you are running a service where you provide a
world accessible black list to the general public then I want to give
you this feed for free. Many of you are better at processing this than I
am. If you are running a commercial spam filtering service for your
customers only then I want to sell you the feed for a reasonable cost.
No feed is 100% perfect. But the IP based zombie feed is very close. The
other spam feed is also very good too but will have more FPs than the
first list. I don't send all my spam, just the stuff that has a very
high score. You are welcome to do your own checking to verify the feed.
I am also able to extract specific parts like just lists of IP addresses
that should be blocked. And I'm open to suggestions about how to better
provide data.
Feedback welcome.
- Who wants my spam - seriously! Marc Perkel
-