Ian Zimmerman wrote: > Here is my cronjob for that purpose, in its entirety. Note that each of > ~/spam-corpora{ham,spam} is a Maildir. There is a small race condition > between the sa-learn run and the move to cur, which wasn't worth fixing > in my case; if you use this and fix it let me know :)
I looked over your script. I think the use of the ssh for remote processing will probably make it less available to most people. You might consider setting up spamd and spamc for this purpose instead. Also, to give people a known time to react to mistakes it is nice to not process email immediately but to specify some time such as five minutes after saving it or some such. I use find with a ! -newerct "5 minutes ago" to process messages older than five minutes. That way if I save something by mistake I have a few minutes to react and remove the message from the learning. Instead of mv I have used safecat for moving messages around. And generally I avoid worrying about whitespace in filenames for this since I am guaranteed the file names are well formed without any whitespace. Instead of: for m in `ls ~/spam-corpora/${food}/new` ; do cat ~/spam-corpora/${food}/new/${m} | formail done | ssh $server sa-learn --${food} --mbox - I would suggest something more along the lines of this different and not not equivalent but similar script. cd $MAILBOXDIR || exit 1 for f in $(find spam-new/new spam-new/cur -ignore_readdir_race -type f ! -newerct "6 minutes ago" -print); do spamc -x -d $server --learntype=spam < "$f" rc=$? if [ $rc -eq 0 ] || [ $rc -eq 98 ]; then # rc=98: This appears to be the return (undocumented) when spamc # can't learn the message because it is already learned. The # docs say that EX_TOOBIG 98 is not otherwise used. if safecat spam/tmp spam/cur < $f >/dev/null; then rm -f $f fi else echo "sa-learn failed $rc on $f" fi done Perhaps the comments about spamc return code 98 would cause someone here to look at that part of the code. It has been years since I put in that comment. Perhaps it is even different now. Don't know. I have thought about refactoring this into two scripts so that the find could -exec the second. That would eliminate the for f in arguments syntax which would save memory. But the memory use is small for my case, I do not need to worry about filenames with whitespace, and I like having one script instead of two so that I can see everything. Something to think about. The above is not in its entirety because I cut it down from a larger case that is doing other things. It would need a little work. But it might give some ideas. Bob