Am 09.10.2014 um 21:43 schrieb John Traweek CCNA, Sec+:
I’ve built a gateway server using sa-exim to filter email for our corporate Microsoft Exchange environment. It’s working pretty good, but I have Bayes turned off due to the fact that I am unsure on how to train it in this type of environment. Has someone written a how to article on how to efficiently continually train Bayes in any environment like this. I was thinking if specific users could forward SPAM to some box on Exchange and have sa-exim POP it or something to “learn” that would be ideal, but maybe there is a better way. Any ideas are appreciated, the easier the better
i just decided to stay on spamass-milter which implies a single user and so one central bayes trained with a simple script from two folders (ham and spam) and disable any autolearning - users are adviced to foreard samples as attachment which get added after review, until now not more than 5 per day, the rest is catched by the fact that i receive currently 10 email addresses including some alias-lists and so face all sort of crap
the ham folder just contains a lot of my legit mail in case it don#t contain sensible data
the machine itself is inbound only with postfix-transport tables after the filters and so should match your subject
so far the results are impressivethe first script is a wrapper running as root and take care of permissions and remove dulicates to optimize the training in case of a complete rebuild, the sample eml-files are renamend with Konqueror to "YYYY-mm-dd-#" and so get a automatic number wich offers to remove outdated spam samples and rebuild easy in a year or two
the second script does the training itself, is running as the milter-user and is called with "su" from the wrapper, the milter-user has /bin/dash as sehll instead /sbin/nologin
[root@mail-gw:~]$ cat /scripts/sa-learn.sh #!/usr/bin/bash # Home-Directory und Name des Milter-Users SA_MILTER_HOME="/var/lib/spamass-milter" SA_MILTER_USER="sa-milt" # Permissions der Lern-Dateien sicherstellen chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/ham/ chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/spam/ chmod 750 $SA_MILTER_HOME/training/ham/ chmod 750 $SA_MILTER_HOME/training/spam/ chmod 640 $SA_MILTER_HOME/training/ham/*.eml chmod 640 $SA_MILTER_HOME/training/spam/*.eml # Duplikate in beiden Ordnern entfernen/usr/bin/fdupes -r -f $SA_MILTER_HOME/training/ham/ | grep -v '^$' | xargs rm -v 2> /dev/null /usr/bin/fdupes -r -f $SA_MILTER_HOME/training/spam/ | grep -v '^$' | xargs rm -v 2> /dev/null
# Worker-Script als Milter-User ausfuehren /usr/bin/su -c "$SA_MILTER_HOME/training/learn.sh $1" $SA_MILTER_USER [root@mail-gw:~]$ cat /var/lib/spamass-milter/training/learn.sh #!/usr/bin/bash SA_MILTER_HOME="/var/lib/spamass-milter" SA_MILTER_USER="sa-milt" if test `whoami` = "$SA_MILTER_USER" then /bin/echo "" > /dev/null else/bin/echo "Das Script 'learn.sh' muss als Benutzer '$SA_MILTER_USER' aufgerufen werden"
exit fi cd $SA_MILTER_HOME SHOW_HELP="0"if [ "$1" == "rebuild" ] || [ "$1" == "" ] || [ `echo $((($1*2)/2))` == "$1" ]; then
# Kompletter Rebuild angefordert if [ "$1" == "rebuild" ]; then # Bayes-Reset /usr/bin/sa-learn --clear # SPAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite SPAM Samples"nice -n 19 /usr/bin/sa-learn --progress --spam $SA_MILTER_HOME/training/spam/*.eml
echo "" # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite HAM Samples"nice -n 19 /usr/bin/sa-learn --progress --ham $SA_MILTER_HOME/training/ham/*.eml
echo "" else # Default auf aktuellen Tag oder Parameter if [ "$1" == "" ]; then TRAIN_DAYS="1" else TRAIN_DAYS="$1" fi # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite SPAM Samples"nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/spam/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --spam
echo "" # HAM-Training MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Verarbeite HAM Samples"nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/ham/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --ham
echo "" fi else SHOW_HELP="1" fi if [ "$1" == "--help" ] || [ "$1" == "-h" ] || [ "$SHOW_HELP" == "1" ]; then echo "Bayes-Maintaining-Skript" echo "Usage:"echo " rebuild: Bayes komplett zuruecksetzen und anhand der Samples neu aufbauen"
echo " <days>: Alter der zu trainierenden Samples in Tagen (Default: 1)" exit fi MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S") echo "$MY_TIME: Done" echo "" nice -n 19 /usr/bin/sa-learn --dump magic echo "" /usr/bin/ls -l -h --time-style=long-is $SA_MILTER_HOME/.spamassassin/
signature.asc
Description: OpenPGP digital signature