It is not fully tested yet but here it is. NB that I changed the USER env variable to USERNAME. I do not know if this is common on all flavors of linux but USER does not transliterate under su conditions to the child id but stays the parent. The var USERNAME does change to reflect the child username. Also, this script is still localized somewhat since it assumes all Junk folders are prefixed with Junk and I did not adjust the courier IMAP code with my changes since I had no system to test against. It should provide for some interesting ideas nonetheless.
New features include cross version compatibility, higher speed (using bayes journals), debugging and error controls, wider bayes training and most importantly support for UWash based IMAP and mbox format mailboxes.
Tom

Rubin Bennett wrote:
Hello all...
I figure I've asked enough questions of this list that it's about time I
gave something back... You may not want it,but here it is anyway :)

I've written a bash script that takes will run sa-learn against the
administrator specified False-Postive and False-Negative folders.

Run this script from cron, and have your users drag n' drop emails that
get misclassified by SA to the appropriate folders.  The script will act
in 2 ways:

1.) Run it as root, and it will parse the administrator specified
USERLIST and run the internally defined autoLearn() function as each
user.
2.) Run it as an ordinary user and it will only learn from that user's
email.

I wrote it this way so that I could have a wrapper around sa-learn that
would make sure that the directories exist, create them if they don't
using maildirmake++, and not try to learn from directories with no
messages in them.

This is written to work with Courier IMAP and Maildir; I have not tried
it with anything else.

Someday I may get around to rewriting it in php and using php-imap to do
the moving around etc, but as a dirty hack this works ok.  It also
doesn't need passwords etc. in config files...

I hope this benefits someone out there... if there's enough interest,
I'll put it on my website and do a proper CVS for it.

If anyone has ideas for making it better (or suck less), let me know. 
Patches are always welcome...
  

#!/bin/bash # Copyright (c) 2004 by Rubin Bennett <[EMAIL PROTECTED]> # All Rights reserved. #This program is free software; you can redistribute it and/or #modify it under the terms of the GNU General Public License #as published by the Free Software Foundation; either version 2 #of the License, or (at your option) any later version. # #This program is distributed in the hope that it will be useful, #but WITHOUT ANY WARRANTY; without even the implied warranty of #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #GNU General Public License for more details. # #You should have received a copy of the GNU General Public License #along with this program; if not, write to the Free Software #Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. # Usage: IMAP users can move misclassified emails into the "False Negative" # or "Flase Positive" folders, and this script will learn from them and put # them where they belong. # Spam will be moved to the designated Spam folder, and Ham will be moved to # the user's Inbox. # This script should be called by CRON or a similar scheduler. # Requires: # Maildir style email storage (i.e. Courier IMAP) and IMAP server # Settings - tweak as necessary. MAILDIR="/home/$USER/Maildir" FALSE_NEG_FOLDER="Undetected Spam" FALSE_POS_FOLDER="Not Spam" SPAMFOLDER="Spam" # List of users to run the autoLearn funtcion as (space separated)... USERLIST="" autoLearn() { # Checks to see if the specified FALSE_NEG_FOLDER and FALSE_POS_FOLDER exist, # and creates them if necessary. [ -d "${MAILDIR}/.${FALSE_NEG_FOLDER}" ] || /usr/bin/maildirmake++ -f "${FALSE_NEG_FOLDER}" "${MAILDIR}" [ -d "${MAILDIR}/.${FALSE_POS_FOLDER}" ] || /usr/bin/maildirmake++ -f "${FALSE_POS_FOLDER}" "${MAILDIR}" # Parses the designated Ham folder and then moves it's contents to the Inbox hamCount=`find "${MAILDIR}/.${FALSE_POS_FOLDER}/cur" | wc -l` if [ $hamCount -gt 2 ] then echo "Learning from $hamCount HAM's" sa-learn --ham "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/*" mv "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/"* ${MAILDIR}/cur/ fi # Parses the "Undetected Spam" folder and then moved it's contents to Spam spamCount=`find "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur" | wc -l` if [ $spamCount -gt 2 ] then echo "Learning from $spamCount SPAM's" sa-learn --spam "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/*" mv "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/"* ${MAILDIR}/.${SPAMFOLDER}/cur/ fi } ############### End of function declaration ############### if [ "${USER}" == "root" ] then for USER in $USERLIST; do echo "learning for $USER" su - $USER -c sa-autolearn done else autoLearn fi

#!/bin/bash

# Copyright (c) 2004 by Rubin Bennett <[EMAIL PROTECTED]>
# All Rights reserved.

#This program is free software; you can redistribute it and/or
#modify it under the terms of the GNU General Public License
#as published by the Free Software Foundation; either version 2
#of the License, or (at your option) any later version.
#
#This program is distributed in the hope that it will be useful,
#but WITHOUT ANY WARRANTY; without even the implied warranty of
#MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#GNU General Public License for more details.
#
#You should have received a copy of the GNU General Public License
#along with this program; if not, write to the Free Software
#Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


# Usage: IMAP users can move misclassified emails into the "False Negative"
# or "Flase Positive" folders, and this script will learn from them and put
# them where they belong.
# Spam will be moved to the designated Spam folder, and Ham will be moved to
# the user's Inbox.

# This script should be called by CRON or a similar scheduler.


# Settings - tweak as necessary.
MAILDIR="/home/$USER/Mail"
MAILINBOX="/var/spool/mail/$USER"
FALSE_NEG_FOLDER="Junk-Probable"
FALSE_POS_FOLDER="Not Junk"
SPAMFOLDER="Junk"
MAILDIRMAKE="/usr/bin/maildirmake++ -f"

# List of users to run the autoLearn funtcion as (space separated)...
USERLIST=""

# Program Control Variables
# Create the FN/FP folders if not found
MKMAILDIR=0
COURIER=0
DEBUG=0


############### function declaration ###############
Debug() {
if [ $DEBUG -eq 1 ]
        then
        echo "$@"
fi
}

Error() {
        cat <<EOF
Error: $@
EOF
}

AutoLearn() {
        
        if [ $COURIER -eq 1 ]
                then
                if [ $MKMAILDIR -eq 1 ]
                        then
                        # Checks to see if the specified FALSE_NEG_FOLDER and 
FALSE_POS_FOLDER exist,
                        # and creates them if necessary.
                        [ -d "${MAILDIR}/.${FALSE_NEG_FOLDER}" ] || MAILDIRMAKE 
"${FALSE_NEG_FOLDER}" "${MAILDIR}"
                        [ -d "${MAILDIR}/.${FALSE_POS_FOLDER}" ] || MAILDIRMAKE 
"${FALSE_POS_FOLDER}" "${MAILDIR}"
                fi

                # Parses the designated Ham folder and then moves it's contents 
to the Inbox
                hamCount=`find "${MAILDIR}/.${FALSE_POS_FOLDER}/cur" | wc -l`
                if [ $hamCount -gt 2 ]
                  then
                  echo "Learning from $hamCount HAM's"
                  sa-learn --ham "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/*"
                  mv "${MAILDIR}/.${FALSE_POS_FOLDER}/cur/"* ${MAILDIR}/cur/
                fi
        else
                if [ -f ${MAILINBOX} ]
                        then
                        Debug "Would have run: sa-learn --ham ${NOSYNCCMD} 
${MBOX} ${MAILINBOX}"
                        sa-learn --ham ${NOSYNCCMD} ${MBOX} ${MAILINBOX}
                fi
                ls "${MAILDIR}" | \
                while read ii
                do
                        Debug "Seen box ${ii}"
                        if [ `echo $ii | grep -vc "^Junk\w*"` -eq 1 ]
                                then
                                #`echo $ii | sed '/\ /s//\\ /'`
                                Debug "Processing box ${ii}"
                                Debug "Would have run: sa-learn --ham 
${NOSYNCCMD} ${MBOX} ${MAILDIR}/$ii"
                                sa-learn --ham ${NOSYNCCMD} ${MBOX} 
"${MAILDIR}/$ii"
                        fi
                done
        fi

        if [ $COURIER -eq 1 ]
                then
                # Parses the "Undetected Spam" folder and then moved it's 
contents to Spam
                spamCount=`find "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur" | wc -l`
                if [ $spamCount -gt 2 ]
                        then
                        echo "Learning from $spamCount SPAM's"
                        sa-learn --spam  ${NOSYNCCMD} ${MBOX} 
"${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/*"
                        mv "${MAILDIR}/.${FALSE_NEG_FOLDER}/cur/"* 
${MAILDIR}/.${SPAMFOLDER}/cur/
                fi
        else
                if [ -f "${MAILDIR}/${FALSE_NEG_FOLDER}" ]
                        then
                        Debug "Would have run: sa-learn --spam ${NOSYNCCMD} 
${MBOX} ${MAILDIR}/${FALSE_NEG_FOLDER}"
                        sa-learn --spam ${NOSYNCCMD} ${MBOX} 
"${MAILDIR}/${FALSE_NEG_FOLDER}"
                fi
                if [ -f "${MAILDIR}/${SPAMFOLDER}" ]
                        then
                        Debug "Would have run: sa-learn --spam ${NOSYNCCMD} 
${MBOX} ${MAILDIR}/${FALSE_NEG_FOLDER}"
                        sa-learn --spam ${NOSYNCCMD} ${MBOX} 
"${MAILDIR}/${FALSE_NEG_FOLDER}"
                fi
        fi
        Debug "Would have run: sa-learn ${SYNCCMD}"
        sa-learn ${SYNCCMD}
}

############### End of function declaration ###############


while getopts "mcd" opt; do
   case $opt in

   c )  COURIER=1;;
#   v )  SAVER=$OPTARG;; #v:
   m )  MKMAILDIR=1;;
   d )  DEBUG=1;;
   h )  usage ;;
   \?)  usage ;;
   esac
done
#remove all of the processed arguments
ARGLIST=@
shift $(($OPTIND - 1))

if [ ${COURIER} -eq 0 ]
        then
        MBOX=" --mbox "
fi

# sa-learn -V --> SpamAssassin version 2.63
if [ `sa-learn -V |  grep -cE "2.[0123456789]{1,2}"` -eq 1 ]
        then
        SYNCCMD="--rebuild "
        NOSYNCCMD="--no-rebuild "
        Debug 'SA reports Version 2'
elif [ `sa-learn -V | grep -cE "3.[0123456789]{1,2}"` -eq 1 ]
        then
        SYNCCMD="--sync "
        NOSYNCCMD="--no-sync "
        Debug 'SA reports Version 3'
else
        Error "Script does not handle this version of spamassassin"
        exit
fi

if [ $DEBUG -eq 1 ]
then
        Debug "Maildir: " $MAILDIR
        Debug "Inbox: " $MAILINBOX
        Debug "FN Dir: " $FALSE_NEG_FOLDER
        Debug "FP Dir: " $FALSE_POS_FOLDER
        Debug "Spamdir: " $SPAMFOLDER
        Debug "MAILDIRMAKE: " $MAILDIRMAKE
        Debug "usrs: " $USERLIST
        Debug "MKMAILDIR: " $MKMAILDIR
        Debug "CourierFlag: " $COURIER
fi

if [ "${USERNAME}" == "root" ]
then
# This is scoped as USER and USERNAME exist in ENV. 
# However, USER is overriding the system defined USER ENV var
# It should be changed.
for USER in $USERLIST;
  do
        echo "learning for $USER"
        su - $USER -c $0 $ARGLIST
  done
else
  AutoLearn
fi

usage () {
   cat <<EOF
Usage: $0 [-c] [-h] [-m]
        -c    Denotes a Courier Mail server instead of UWash.
          NB: Requires Courier Mail server and maildirmake++
    -m    Tells the Script to make the mailbox if it does not
          already exist. Requires Courier and maildirmake++
EOF
   exit 0
}


Reply via email to