It is not fully tested yet but here it is. NB that I changed the USER
env variable to USERNAME. I do not know if this is common on all
flavors of linux but USER does not transliterate under su conditions to
the child id but stays the parent. The var USERNAME does change to
reflect the child username. Also, this script is still localized
somewhat since it assumes all Junk folders are prefixed with Junk and I
did not adjust the courier IMAP code with my changes since I had no
system to test against. It should provide for some interesting ideas
nonetheless. New features include cross version compatibility, higher speed (using bayes journals), debugging and error controls, wider bayes training and most importantly support for UWash based IMAP and mbox format mailboxes. Tom Rubin Bennett wrote: Hello all... I figure I've asked enough questions of this list that it's about time I gave something back... You may not want it,but here it is anyway :)I've written a bash script that takes will run sa-learn against the administrator specified False-Postive and False-Negative folders. Run this script from cron, and have your users drag n' drop emails that get misclassified by SA to the appropriate folders. The script will act in 2 ways: 1.) Run it as root, and it will parse the administrator specified USERLIST and run the internally defined autoLearn() function as each user. 2.) Run it as an ordinary user and it will only learn from that user's email. I wrote it this way so that I could have a wrapper around sa-learn that would make sure that the directories exist, create them if they don't using maildirmake++, and not try to learn from directories with no messages in them. This is written to work with Courier IMAP and Maildir; I have not tried it with anything else. Someday I may get around to rewriting it in php and using php-imap to do the moving around etc, but as a dirty hack this works ok. It also doesn't need passwords etc. in config files... I hope this benefits someone out there... if there's enough interest, I'll put it on my website and do a proper CVS for it. If anyone has ideas for making it better (or suck less), let me know. Patches are always welcome... |
#!/bin/bash # Copyright (c) 2004 by Rubin Bennett <[EMAIL PROTECTED]> # All Rights reserved. #This program is free software; you can redistribute it and/or #modify it under the terms of the GNU General Public License #as published by the Free Software Foundation; either version 2 #of the License, or (at your option) any later version. # #This program is distributed in the hope that it will be useful, #but WITHOUT ANY WARRANTY; without even the implied warranty of #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #GNU General Public License for more details. # #You should have received a copy of the GNU General Public License #along with this program; if not, write to the Free Software #Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # Usage: IMAP users can move misclassified emails into the "False Negative" # or "Flase Positive" folders, and this script will learn from them and put # them where they belong. # Spam will be moved to the designated Spam folder, and Ham will be moved to # the user's Inbox. # This script should be called by CRON or a similar scheduler. # Settings - tweak as necessary. MAILDIR="/home/$USER/Mail" MAILINBOX="/var/spool/mail/$USER" FALSE_NEG_FOLDER="Junk-Probable" FALSE_POS_FOLDER="Not Junk" SPAMFOLDER="Junk" MAILDIRMAKE="/usr/bin/maildirmake++ -f" # List of users to run the autoLearn funtcion as (space separated)... USERLIST="" # Program Control Variables # Create the FN/FP folders if not found MKMAILDIR=0 COURIER=0 DEBUG=0 ############### function declaration ############### Debug() { if [ $DEBUG -eq 1 ] then echo "$@" fi } Error() { cat <<EOF Error: $@ EOF } AutoLearn() { if [ $COURIER -eq 1 ] then if [ $MKMAILDIR -eq 1 ] then # Checks to see if the specified FALSE_NEG_FOLDER and FALSE_POS_FOLDER exist, # and creates them if necessary. [ -d "${MAILDIR}/.${FALSE_NEG_FOLDER}" ] || MAILDIRMAKE "${FALSE_NEG_FOLDER}" "${MAILDIR}" [ -d "${MAILDIR}/.${FALSE_POS_FOLDER}" ] || MAILDIRMAKE "${FALSE_POS_FOLDER}" "${MAILDIR}" fi # Parses the designated Ham folder and then moves it's contents to the Inbox hamCount=`find "${MAILDIR}/.${FALSE_POS_FOLDER}/cur" | wc -l` if [ $hamCount -gt 2 ] then echo "Learning from $hamCount HAM's" sa-learn --ham "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/*" mv "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/"* ${MAILDIR}/cur/ fi else if [ -f ${MAILINBOX} ] then Debug "Would have run: sa-learn --ham ${NOSYNCCMD} ${MBOX} ${MAILINBOX}" sa-learn --ham ${NOSYNCCMD} ${MBOX} ${MAILINBOX} fi ls "${MAILDIR}" | \ while read ii do Debug "Seen box ${ii}" if [ `echo $ii | grep -vc "^Junk\w*"` -eq 1 ] then #`echo $ii | sed '/\ /s//\\ /'` Debug "Processing box ${ii}" Debug "Would have run: sa-learn --ham ${NOSYNCCMD} ${MBOX} ${MAILDIR}/$ii" sa-learn --ham ${NOSYNCCMD} ${MBOX} "${MAILDIR}/$ii" fi done fi if [ $COURIER -eq 1 ] then # Parses the "Undetected Spam" folder and then moved it's contents to Spam spamCount=`find "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur" | wc -l` if [ $spamCount -gt 2 ] then echo "Learning from $spamCount SPAM's" sa-learn --spam ${NOSYNCCMD} ${MBOX} "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/*" mv "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/"* ${MAILDIR}/.${SPAMFOLDER}/cur/ fi else if [ -f "${MAILDIR}/${FALSE_NEG_FOLDER}" ] then Debug "Would have run: sa-learn --spam ${NOSYNCCMD} ${MBOX} ${MAILDIR}/${FALSE_NEG_FOLDER}" sa-learn --spam ${NOSYNCCMD} ${MBOX} "${MAILDIR}/${FALSE_NEG_FOLDER}" fi if [ -f "${MAILDIR}/${SPAMFOLDER}" ] then Debug "Would have run: sa-learn --spam ${NOSYNCCMD} ${MBOX} ${MAILDIR}/${FALSE_NEG_FOLDER}" sa-learn --spam ${NOSYNCCMD} ${MBOX} "${MAILDIR}/${FALSE_NEG_FOLDER}" fi fi Debug "Would have run: sa-learn ${SYNCCMD}" sa-learn ${SYNCCMD} } ############### End of function declaration ############### while getopts "mcd" opt; do case $opt in c ) COURIER=1;; # v ) SAVER=$OPTARG;; #v: m ) MKMAILDIR=1;; d ) DEBUG=1;; h ) usage ;; \?) usage ;; esac done #remove all of the processed arguments ARGLIST=@ shift $(($OPTIND - 1)) if [ ${COURIER} -eq 0 ] then MBOX=" --mbox " fi # sa-learn -V --> SpamAssassin version 2.63 if [ `sa-learn -V | grep -cE "2.[0123456789]{1,2}"` -eq 1 ] then SYNCCMD="--rebuild " NOSYNCCMD="--no-rebuild " Debug 'SA reports Version 2' elif [ `sa-learn -V | grep -cE "3.[0123456789]{1,2}"` -eq 1 ] then SYNCCMD="--sync " NOSYNCCMD="--no-sync " Debug 'SA reports Version 3' else Error "Script does not handle this version of spamassassin" exit fi if [ $DEBUG -eq 1 ] then Debug "Maildir: " $MAILDIR Debug "Inbox: " $MAILINBOX Debug "FN Dir: " $FALSE_NEG_FOLDER Debug "FP Dir: " $FALSE_POS_FOLDER Debug "Spamdir: " $SPAMFOLDER Debug "MAILDIRMAKE: " $MAILDIRMAKE Debug "usrs: " $USERLIST Debug "MKMAILDIR: " $MKMAILDIR Debug "CourierFlag: " $COURIER fi if [ "${USERNAME}" == "root" ] then # This is scoped as USER and USERNAME exist in ENV. # However, USER is overriding the system defined USER ENV var # It should be changed. for USER in $USERLIST; do echo "learning for $USER" su - $USER -c $0 $ARGLIST done else AutoLearn fi usage () { cat <<EOF Usage: $0 [-c] [-h] [-m] -c Denotes a Courier Mail server instead of UWash. NB: Requires Courier Mail server and maildirmake++ -m Tells the Script to make the mailbox if it does not already exist. Requires Courier and maildirmake++ EOF exit 0 }