Hi Robert,

Thank you very much for your detailed reply. It was very helpful. I just have one question. Why can you not run sa-learn on spam already flagged as spam. I thought spamassassin would rip out any headers it already added. If that is the case then what is the harm in re learning the spam as spam ... (I am just asking .. not trying to argue ... just curious).

Thank you again for your help,

Peter

Robert Menschel wrote:
Hello Peter,

Thursday, April 7, 2005, 5:29:38 AM, you wrote:

PM> I have been building a new mailserver to replace my old one.
PM> The new one has postfix, Cyrus-imap, anomy, spamassassin.  I am trying
PM> to set up the bays auto-learn stuff.  Each user has a home directory on
PM> the server (they can not log onto the server).  I am using the Maildir
PM> format.

PM> Is it better to have a cron job run by a single user (say root) to do
PM> the ham / spam learning for everyone, or should I run a cron for each
PM> individual user.  All users belong to the same company.

Best, if you have the disk space for the multitude of Bayes databases,
is to run ham/spam learning as each user. I'd recommend the "running
constantly if I staggered it for every user," something like:
- run as cron
- get cycle start time
- identify list of active users
- for each active user
  - determine if anything to learn; skip to next user if not
  - su to that user's id
  - sa-learn
- if not yet 30 min since start of this cycle, sleep 15 min
- loop to next cycle.

PM> Problem I have thought of with the latter.
PM> 1.  There would be approximitly 130 cron jobs running sa-learn at the
PM> same time .... or it would run constantly if I staggered it for every
PM> user.  What kind of load will that have on  my 850 with 756 MB of ram ?

running constantly, staggered, will work better on that system (IMO)
than allowing multiple executions at the same time.

PM> Problems I have with both:
PM> 1.  What is the best method of obtaining the spam / ham.  I have the
PM> server create a spam folder for each user when the user is created.
PM> spamassassin will automatically put all mail marked as spam in this
PM> folder.  Obviously I will use this folder to run salearn on for spam.

NO. NO. NO. NO.

Do not run sa-learn on automatically flagged emails. SA does this
itself somewhat conservatively (though not conservatively enough --
I suggest lowering the ham auto-learn threshold).

Provide instead a "missed-spam" folder and a "not-spam" folder. Have
your people copy/move miscategorized emails into those, and learn from
those folders.

PM> 2. How often should I run sa-learn ?  Users here for the most part get
PM> mail in their inbox and then after reading it move it to some other sub
PM> folder ... (of which everyones is different, and some have over 100).

On single-domain systems I normally run it hourly.

PM> Are there any downfalls to running a site wide one ?  What is the best
PM> method of doing this if this is a better method.  Currently I plan to
PM> use this to learn the spam.  Does anyone see any problems.
PM> (Note:  this assumes it is being run as a particular user.)

Some people prefer system-wide, others domain-wide, others
user-specific.  YMMV. Feasibility might be the more important
criteria, since all three can work.

Bob Menschel




-- Peter Marshall, BCS System Administrator, CARIS

CARIS 2005 - Mapping a Seamless Society
10th International User Group Conference and Educational Sessions
Halifax, NS, Canada
E-mail [EMAIL PROTECTED] for more.

Reply via email to