Ian Zimmerman wrote:
> Here is my cronjob for that purpose, in its entirety.  Note that each of
> ~/spam-corpora{ham,spam} is a Maildir.  There is a small race condition
> between the sa-learn run and the move to cur, which wasn't worth fixing
> in my case; if you use this and fix it let me know :)

I looked over your script.  I think the use of the ssh for remote
processing will probably make it less available to most people.  You
might consider setting up spamd and spamc for this purpose instead.

Also, to give people a known time to react to mistakes it is nice to
not process email immediately but to specify some time such as five
minutes after saving it or some such.  I use find with a ! -newerct "5
minutes ago" to process messages older than five minutes.  That way if
I save something by mistake I have a few minutes to react and remove
the message from the learning.

Instead of mv I have used safecat for moving messages around.  And
generally I avoid worrying about whitespace in filenames for this
since I am guaranteed the file names are well formed without any
whitespace.

Instead of:

        for m in `ls ~/spam-corpora/${food}/new` ; do
            cat ~/spam-corpora/${food}/new/${m} | formail
        done | ssh $server sa-learn --${food} --mbox -

I would suggest something more along the lines of this different and
not not equivalent but similar script.

  cd $MAILBOXDIR || exit 1
  for f in $(find spam-new/new spam-new/cur -ignore_readdir_race -type f ! 
-newerct "6 minutes ago" -print); do

    spamc -x -d $server --learntype=spam < "$f"
    rc=$?
    if [ $rc -eq 0 ] || [ $rc -eq 98 ]; then
      # rc=98: This appears to be the return (undocumented) when spamc
      # can't learn the message because it is already learned.  The
      # docs say that EX_TOOBIG 98 is not otherwise used.
      if safecat spam/tmp spam/cur < $f >/dev/null; then
        rm -f $f
      fi
    else
      echo "sa-learn failed $rc on $f"
    fi

  done

Perhaps the comments about spamc return code 98 would cause someone
here to look at that part of the code.  It has been years since I put
in that comment.  Perhaps it is even different now.  Don't know.

I have thought about refactoring this into two scripts so that the
find could -exec the second.  That would eliminate the for f in
arguments syntax which would save memory.  But the memory use is small
for my case, I do not need to worry about filenames with whitespace,
and I like having one script instead of two so that I can see everything.

Something to think about.  The above is not in its entirety because I
cut it down from a larger case that is doing other things.  It would
need a little work.  But it might give some ideas.

Bob

Reply via email to