Re: Spam by IP-address? Spamassassin with geoiplookup?

Dave Funk Thu, 22 Sep 2016 06:40:32 -0700

On Thu, 22 Sep 2016, Thomas Barth wrote:

And what about filter poisening? In the last 10 hours my company address got43 mails classified as spam (even a virus mail detected today). And there wasone mail classified as spam due to my rule (bad country, message-id.
X-Spam-Status: Yes, score=7.474 tag=2 tag2=6.31 kill=6.31
       tests=[MESSAGEID_LOCAL=3, RDNS_NONE=1.274, RELAYCOUNTRY_BAD=3.2]
       autolearn=no autolearn_force=no

The content of the mail is:

------------------------------------------------
From: "Lupe Monroe" <monroe.4...@static.vnpt.vn>
To: "my boss address"
Subject: Payment approved
MIME-Version: 1.0
Content-Type: multipart/related;
       boundary="boundary_af9c8db46e1111b73fca8b315aafef01"
Message-Id: <20160922063255.e11d3e5...@static.vnpt.vn.local>
Date: Thu, 22 Sep 2016 06:32:55 +0700

--boundary_af9c8db46e1111b73fca8b315aafef01
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit

Dear so,

Your payment has been approved. Your account will be debited within two days.

You can email us for any query regarding your account.

Thank you.

Lupe Monroe
Support

--boundary_af9c8db46e1111b73fca8b315aafef01
Content-Type: application/x-zip-compressed;name="e6dfa16bdb.zip.virus-scan-me.virus-scan-me"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;filename="e6dfa16bdb.zip.virus-scan-me.virus-scan-me"
------------------------------------------------
There is no spam content, am I right? Normal words and content that a normalperson can use. I dont need spam learning for all the mails alreadyclassified as spam with high score. Spam with low score are interesting forspam learning like this one. But when I use these mails for spam learningthere is a risk of false positive some day, because it has learned thatnormal mails are also spam?

You are missing the point that Bayes uses more than just body words from amessage. It also looks at headers and meta-data. So those particular bodywords could become "neutral" (neither spam nor ham indicators) but theother components of that message (such as that '.vn.local' message ID)would be learned as spam signs.

This is why you MUST also train your Bayes with HAM messages (and trainthem with the --ham flag) so Bayes knows how to recognise 'hammy' or'neutral' tokens to prevent false-positives.



--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Spam by IP-address? Spamassassin with geoiplookup?

Reply via email to