Hi, all,

I wonder if the iXhash Plugin I did last summer would catch these.
FYI, the plugin uses some form(s) of fuzzy MD5 checksums of the complete mail body (not seperate mime parts) and does compare the results with those I provide via DNS.
It's available at http://wiki.apache.org/spamassassin/iXhash.
If not, enhancing it to also compute checksums of attachments would be nice to have. If only I had the time...

Dirk


William Stearns schrieb:
Good evening, Jack, all,

On Tue, 7 Mar 2006, Jack Gostl wrote:

I've seen some references to this in threads, but I didn't see an answer.

Starting in late November, we started getting hit with spam that was almost entirely a jpeg. They seem to be mostly "stock recommendations". There is minimal message, usually HTML, and the real spam content is in the image. Despite al the trainging that I do, this seems to slip through the Bayes algorithms with no more than a 50%, and the rest of the tests don't drive the score up high enough to help.

I am currently running SpamAssassin 3.0.3. I tried running these messages through SpamAssassin 3.1 and it doesn't seem to help.

Any suggestions?

We talked about identifying images last summer. There are a few answers, some of which have been discussed in this thread already. Razor, pyzor, and DCC are designed to score up messages with already-seen mime parts (read: if 3 other people think that image is spam, your spam filter can score it up). As with identifying text parts where the spammer inserts random words to throw those services off, images can be subtly modified so the visible area is essentially identical but the actual image file is different with every spam run. I offered to put together a catalog of checksums of images used in spam, and have done so. The md5 and sha1 sums of 44,522 spam images can be found at http://www.stearns.org/spamattach/ , broken out by category and in combined files. If anyone wants to take on an interesting project of computing the md5 checksums of attachments, I'd be willing to set those lists up as a dns-queriable rbl (along the lines of 01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return 127.0.0.2 if known, nothing if not). I already understand the downsides to this approach (duplicates work of razor, pyzor, and dcc, images can be altered), but figure the checksum work has already been done and will continue to be done anyways.
    Anyone up for it?
    Cheers,
    - Bill

---------------------------------------------------------------------------
        "That man is a success who lived well, laughed often and loved
much: who has gained the respect of intelligent men and the love of
children: who has filled his niche and accomplished his task: who leaves
the world a better place than he found it, whether by an improved poppy,
a perfect poem or a rescued soul; who never lacked appreciation of
earth's beauty or failed to express it; who looked for the best in
others and gave the best he had."
-- Robert Louis Stevenson. --------------------------------------------------------------------------
William Stearns ([EMAIL PROTECTED]).  Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at: http://www.stearns.org --------------------------------------------------------------------------

Reply via email to