Hello Peter,

Thursday, April 7, 2005, 5:29:38 AM, you wrote:

PM> I have been building a new mailserver to replace my old one.
PM> The new one has postfix, Cyrus-imap, anomy, spamassassin.  I am trying
PM> to set up the bays auto-learn stuff.  Each user has a home directory on
PM> the server (they can not log onto the server).  I am using the Maildir
PM> format.

PM> Is it better to have a cron job run by a single user (say root) to do
PM> the ham / spam learning for everyone, or should I run a cron for each
PM> individual user.  All users belong to the same company.

Best, if you have the disk space for the multitude of Bayes databases,
is to run ham/spam learning as each user. I'd recommend the "running
constantly if I staggered it for every user," something like:
- run as cron
- get cycle start time
- identify list of active users
- for each active user
  - determine if anything to learn; skip to next user if not
  - su to that user's id
  - sa-learn
- if not yet 30 min since start of this cycle, sleep 15 min
- loop to next cycle.

PM> Problem I have thought of with the latter.
PM> 1.  There would be approximitly 130 cron jobs running sa-learn at the
PM> same time .... or it would run constantly if I staggered it for every
PM> user.  What kind of load will that have on  my 850 with 756 MB of ram ?

running constantly, staggered, will work better on that system (IMO)
than allowing multiple executions at the same time.

PM> Problems I have with both:
PM> 1.  What is the best method of obtaining the spam / ham.  I have the
PM> server create a spam folder for each user when the user is created.
PM> spamassassin will automatically put all mail marked as spam in this
PM> folder.  Obviously I will use this folder to run salearn on for spam.

NO. NO. NO. NO.

Do not run sa-learn on automatically flagged emails. SA does this
itself somewhat conservatively (though not conservatively enough --
I suggest lowering the ham auto-learn threshold).

Provide instead a "missed-spam" folder and a "not-spam" folder. Have
your people copy/move miscategorized emails into those, and learn from
those folders.

PM> 2. How often should I run sa-learn ?  Users here for the most part get
PM> mail in their inbox and then after reading it move it to some other sub
PM> folder ... (of which everyones is different, and some have over 100).

On single-domain systems I normally run it hourly.

PM> Are there any downfalls to running a site wide one ?  What is the best
PM> method of doing this if this is a better method.  Currently I plan to
PM> use this to learn the spam.  Does anyone see any problems.
PM> (Note:  this assumes it is being run as a particular user.)

Some people prefer system-wide, others domain-wide, others
user-specific.  YMMV. Feasibility might be the more important
criteria, since all three can work.

Bob Menschel


Reply via email to