Yves Goergen wrote: > The Bayes filter has never worked for me, but I can't train it either.
Per-user Bayes is VERY good - stock SA3.3.2 with a mostly-autolearn-based Bayes lets through maybe three or four spams a week on my personal server/account (out of several hundred per day). Systemwide Bayes can be nearly as good. What do you mean by "I can't train it either"? > This is a multi-user server and I can't put every single message I get > manually into some script to teach it. It's not practical. "Practical" is a matter of scripting regular operations and getting into a new routine. If you can move messages from folder A to folder B on an IMAP server, you can feed those into Bayes as spam or nonspam. You can either run sa-learn on the server directly against the folder, or use an IMAP-based script like http://www.deepnet.cx/~kdeugau/spamtools/imap-learn. I have had this script running from cron to feed the Bayes DB based on two folders in a junk mail reporting account for a number of years; it works quite well. Customers can use the "Report as spam" function in our Horde/IMP instance, or the addon "Report as junk" function in our Roundcube instance; those reports have the attached message stripped on delivery and filed in a "to-sort" folder. Anyone can also just forward a message as an attachment from any regular mail client. I sort those reported messages, and the script learns them. I don't autodelete the freshly learned mail; I've never seen the point since Bayes tracks which messages it's learned from. I'll admit it's taken a while to reach the point where the process runs fairly smoothly. There's some additional followup I also do to extract relay IP and URI information from the spam to feed to a local DNSBL; the scripts I use are substantially as on https://secure.deepnet.cx/trac/dnsbl. (There are probably a couple of minor enhancements I've added to production I haven't committed to SVN yet.) Aside from some of the details of the workflow, I've used much of the same process across several generations of mail systems, starting with a small system that peaked at about 450 users, using a Berkeley DB shared Bayes. A good initial training message set and early feedback is important in getting it going. > I have the impression that the often-recommended sanesecurity data which > is included in clamav-unofficial-sigs doesn't help at all. I can't see > any difference between before and after its installation. We've been catching between a third and half or so of the ".js-downloader-in-a-.zip" virusmail in SpamAssassin, mainly based on Bayes hits. -kgd