Hi Robert,
Thank you very much for your detailed reply. It was very helpful. I just have one question. Why can you not run sa-learn on spam already flagged as spam. I thought spamassassin would rip out any headers it already added. If that is the case then what is the harm in re learning the spam as spam ... (I am just asking .. not trying to argue ... just curious).
Thank you again for your help,
Peter
Robert Menschel wrote:
Hello Peter,
Thursday, April 7, 2005, 5:29:38 AM, you wrote:
PM> I have been building a new mailserver to replace my old one. PM> The new one has postfix, Cyrus-imap, anomy, spamassassin. I am trying PM> to set up the bays auto-learn stuff. Each user has a home directory on PM> the server (they can not log onto the server). I am using the Maildir PM> format.
PM> Is it better to have a cron job run by a single user (say root) to do PM> the ham / spam learning for everyone, or should I run a cron for each PM> individual user. All users belong to the same company.
Best, if you have the disk space for the multitude of Bayes databases, is to run ham/spam learning as each user. I'd recommend the "running constantly if I staggered it for every user," something like: - run as cron - get cycle start time - identify list of active users - for each active user - determine if anything to learn; skip to next user if not - su to that user's id - sa-learn - if not yet 30 min since start of this cycle, sleep 15 min - loop to next cycle.
PM> Problem I have thought of with the latter. PM> 1. There would be approximitly 130 cron jobs running sa-learn at the PM> same time .... or it would run constantly if I staggered it for every PM> user. What kind of load will that have on my 850 with 756 MB of ram ?
running constantly, staggered, will work better on that system (IMO) than allowing multiple executions at the same time.
PM> Problems I have with both: PM> 1. What is the best method of obtaining the spam / ham. I have the PM> server create a spam folder for each user when the user is created. PM> spamassassin will automatically put all mail marked as spam in this PM> folder. Obviously I will use this folder to run salearn on for spam.
NO. NO. NO. NO.
Do not run sa-learn on automatically flagged emails. SA does this itself somewhat conservatively (though not conservatively enough -- I suggest lowering the ham auto-learn threshold).
Provide instead a "missed-spam" folder and a "not-spam" folder. Have your people copy/move miscategorized emails into those, and learn from those folders.
PM> 2. How often should I run sa-learn ? Users here for the most part get PM> mail in their inbox and then after reading it move it to some other sub PM> folder ... (of which everyones is different, and some have over 100).
On single-domain systems I normally run it hourly.
PM> Are there any downfalls to running a site wide one ? What is the best PM> method of doing this if this is a better method. Currently I plan to PM> use this to learn the spam. Does anyone see any problems. PM> (Note: this assumes it is being run as a particular user.)
Some people prefer system-wide, others domain-wide, others user-specific. YMMV. Feasibility might be the more important criteria, since all three can work.
Bob Menschel
-- Peter Marshall, BCS System Administrator, CARIS
CARIS 2005 - Mapping a Seamless Society 10th International User Group Conference and Educational Sessions Halifax, NS, Canada E-mail [EMAIL PROTECTED] for more.