From: Bob Proulx <b...@proulx.com>
   Date: Mon, 27 Oct 2014 18:37:35 -0600
   
   In the first email:
   
     # The lock file ensures that only 1 spamassassin invocation happens
     # at 1 time, to keep the load down.
     #
     :0fw: spamassassin.lock
     * < 400000
     | spamc -x
   
   Kevin A. McGrail wrote:
   > geoff.spamassassin140903 wrote:
   > > Kevin A. McGrail wrote:
   > > > Using procmail without MTA glue is OK for many uses.  I am wondering 
how
   > > > many spamd connections you allow and if you have checked your logs?
   > > >
   > > > I also cannot remember but the uses of a lock file seem odd for
   > > > something that can thread.  Any one know if that is a good idea to
   > > > remove?
   > >
   > > I wonder if you could explain in simple terms what the lockfile achieves
   > > in this situation? Is it even possible that it could cause messages to
   > > bypass SA?
   >
   > I don't think a lockfile achieves anything because it's a call to a 
program.
   > Procmail has some weird syntax so hopefully someone with some procmail-fu
   > can tell us if a lock on a procmail system call does anything.
   
   Well...  The comment in the example explains what the lock is
   attempting to do.  I think that comment got missed in the follow-ups.
   The lock will restrict spamassassin invocations to one at a time to
   prevent a high system load average running too many spamassassin
   processes all at once.  It will serialize spamassassin invocations to
   one at a time instead of many in parallel.
   
   Normally the MTA will receive incoming messages and will fork a
   process for each incoming connection.  If the outside world connects
   and sends 100 messages all at once then there will be 100 MTA
   processes running in parallel.  If 10,000 all at once then probably
   some MTA process limit will prevent forking that many depending upon
   your configuration.  Each of those will try to send the message
   through procmail and spamassassin in parallel too.  Running 10,000
   procmail processes in parallel probably won't be a problem since it is
   light weight.  However running perl spamassassin 100 or 1,000 times in
   parallel all at once can be quite a resource hit to a moderate system!
   
   By putting the lock in the procmail rule it prevents more than one
   perl spamassassin process from running at a time.  This keeps the
   system from being overloaded due to a spike from the outside world.  I
   want to emphasize that the outside world impacts the system and can
   have an effect of a DDoS just by overwhelming the system with external
   connections.  The MTA has limits to prevent this but while those are
   tuned for normal delivery the MTA maintainers won't know if you are
   running each message through spamasassin and causing a higher load
   because of it.  The default MTA limits are probably too high when
   considering running the message through spamassassin too.
   
   The procmail example comes from the wiki page example:
   
     http://wiki.apache.org/spamassassin/UsedViaProcmail
   
   The wiki page example is launching "spamassassin" not "spamc".  That
   is an important difference to this case.  Someone has changed that to
   spamc in the above and preserved all else including the serialization
   lock.  The spamc talks to a spamd and so the number of parallel
   processes spamd can handle depends upon the spamd configuration.  In
   the spamc use I would be inclined to remove the serialization lock.
   Let it be throttled at the spamd side of things instead.  That would
   make the most sense to me.  Then tune spamd's limits as needed.
   
   In summary I suggest removing the serialization lock from the spamc
   recipe.  Give it a try and monitor system resource utilization.  Start
   tuning at spamd.  Tune other things as needed afterward.
   
     :0fw
     | spamc -x
   
     :0e
     {
       EXITCODE=$?
     }
   
   Bob


I agree with everything you wrote but only when bayes autolearning is
turned off.  Bayes learning holds an exclusive lock to the bayes
database particularly during expiration.

If spamc does bayes autolearning and starts an expiration then other
spamc runs for that user will be locked out of bayes.  At some point
you start getting timeouts at different points in the email delivery
chain.

I have a separate sa-learn (or spamc -L) procmail recipe that has a
serialization lock.

-jeff

Reply via email to