Hi, all,
I wonder if the iXhash Plugin I did last summer would catch these.
FYI, the plugin uses some form(s) of fuzzy MD5 checksums of the complete
mail body (not seperate mime parts) and does compare the results with
those I provide via DNS.
It's available at http://wiki.apache.org/spamassassin/iXhash.
If not, enhancing it to also compute checksums of attachments would be
nice to have. If only I had the time...
Dirk
William Stearns schrieb:
Good evening, Jack, all,
On Tue, 7 Mar 2006, Jack Gostl wrote:
I've seen some references to this in threads, but I didn't see an
answer.
Starting in late November, we started getting hit with spam that was
almost entirely a jpeg. They seem to be mostly "stock
recommendations". There is minimal message, usually HTML, and the
real spam content is in the image. Despite al the trainging that I
do, this seems to slip through the Bayes algorithms with no more than
a 50%, and the rest of the tests don't drive the score up high enough
to help.
I am currently running SpamAssassin 3.0.3. I tried running these
messages through SpamAssassin 3.1 and it doesn't seem to help.
Any suggestions?
We talked about identifying images last summer. There are a few
answers, some of which have been discussed in this thread already.
Razor, pyzor, and DCC are designed to score up messages with
already-seen mime parts (read: if 3 other people think that image is
spam, your spam filter can score it up). As with identifying text
parts where the spammer inserts random words to throw those services
off, images can be subtly modified so the visible area is essentially
identical but the actual image file is different with every spam run.
I offered to put together a catalog of checksums of images used in
spam, and have done so. The md5 and sha1 sums of 44,522 spam images
can be found at http://www.stearns.org/spamattach/ , broken out by
category and in combined files. If anyone wants to take on an
interesting project of computing the md5 checksums of attachments, I'd
be willing to set those lists up as a dns-queriable rbl (along the
lines of
01f5ff6ab05499c94a967409204e6a29.md5.some_rbl.net which would return
127.0.0.2 if known, nothing if not).
I already understand the downsides to this approach (duplicates
work of razor, pyzor, and dcc, images can be altered), but figure the
checksum work has already been done and will continue to be done anyways.
Anyone up for it?
Cheers,
- Bill
---------------------------------------------------------------------------
"That man is a success who lived well, laughed often and loved
much: who has gained the respect of intelligent men and the love of
children: who has filled his niche and accomplished his task: who leaves
the world a better place than he found it, whether by an improved poppy,
a perfect poem or a rescued soul; who never lacked appreciation of
earth's beauty or failed to express it; who looked for the best in
others and gave the best he had."
-- Robert Louis Stevenson.
--------------------------------------------------------------------------
William Stearns ([EMAIL PROTECTED]). Mason, Buildkernel, freedups, p0f,
rsync-backup, ssh-keyinstall, dns-check, more at:
http://www.stearns.org
--------------------------------------------------------------------------