Yves Goergen wrote:
> The Bayes filter has never worked for me, but I can't train it either.

Per-user Bayes is VERY good - stock SA3.3.2 with a
mostly-autolearn-based Bayes lets through maybe three or four spams a
week on my personal server/account (out of several hundred per day).
Systemwide Bayes can be nearly as good.  What do you mean by "I can't
train it either"?

> This is a multi-user server and I can't put every single message I get
> manually into some script to teach it. It's not practical.

"Practical" is a matter of scripting regular operations and getting into
a new routine.

If you can move messages from folder A to folder B on an IMAP server,
you can feed those into Bayes as spam or nonspam.

You can either run sa-learn on the server directly against the folder,
or use an IMAP-based script like
http://www.deepnet.cx/~kdeugau/spamtools/imap-learn.  I have had this
script running from cron to feed the Bayes DB based on two folders in a
junk mail reporting account for a number of years;  it works quite well.

Customers can use the "Report as spam" function in our Horde/IMP
instance, or the addon "Report as junk" function in our Roundcube
instance;  those reports have the attached message stripped on delivery
and filed in a "to-sort" folder.  Anyone can also just forward a message
as an attachment from any regular mail client.  I sort those reported
messages, and the script learns them.

I don't autodelete the freshly learned mail;  I've never seen the point
since Bayes tracks which messages it's learned from.

I'll admit it's taken a while to reach the point where the process runs
fairly smoothly.

There's some additional followup I also do to extract relay IP and URI
information from the spam to feed to a local DNSBL;  the scripts I use
are substantially as on https://secure.deepnet.cx/trac/dnsbl.  (There
are probably a couple of minor enhancements I've added to production I
haven't committed to SVN yet.)

Aside from some of the details of the workflow, I've used much of the
same process across several generations of mail systems, starting with a
small system that peaked at about 450 users, using a Berkeley DB shared
Bayes.  A good initial training message set and early feedback is
important in getting it going.

> I have the impression that the often-recommended sanesecurity data which
> is included in clamav-unofficial-sigs doesn't help at all. I can't see
> any difference between before and after its installation.

We've been catching between a third and half or so of the
".js-downloader-in-a-.zip" virusmail in SpamAssassin, mainly based on
Bayes hits.

-kgd

Reply via email to