Am 09.10.2014 um 21:43 schrieb John Traweek CCNA, Sec+:
I’ve built a gateway server using sa-exim to filter email for our
corporate Microsoft Exchange environment.  It’s working pretty good, but
I have Bayes turned off due to the fact that I am unsure on how to train
it in this type of environment.  Has someone written a how to article on
how to efficiently continually train Bayes in any environment like
this.  I was thinking if specific users could forward SPAM to some box
on Exchange and have sa-exim POP it or something to “learn” that would
be ideal, but maybe there is a better way.  Any ideas are appreciated,
the easier the better

i just decided to stay on spamass-milter which implies a single user and so one central bayes trained with a simple script from two folders (ham and spam) and disable any autolearning - users are adviced to foreard samples as attachment which get added after review, until now not more than 5 per day, the rest is catched by the fact that i receive currently 10 email addresses including some alias-lists and so face all sort of crap

the ham folder just contains a lot of my legit mail in case it don#t contain sensible data

the machine itself is inbound only with postfix-transport tables after the filters and so should match your subject

so far the results are impressive

the first script is a wrapper running as root and take care of permissions and remove dulicates to optimize the training in case of a complete rebuild, the sample eml-files are renamend with Konqueror to "YYYY-mm-dd-#" and so get a automatic number wich offers to remove outdated spam samples and rebuild easy in a year or two

the second script does the training itself, is running as the milter-user and is called with "su" from the wrapper, the milter-user has /bin/dash as sehll instead /sbin/nologin

[root@mail-gw:~]$ cat /scripts/sa-learn.sh
#!/usr/bin/bash
# Home-Directory und Name des Milter-Users
SA_MILTER_HOME="/var/lib/spamass-milter"
SA_MILTER_USER="sa-milt"
# Permissions der Lern-Dateien sicherstellen
chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/ham/
chown root:$SA_MILTER_USER -R $SA_MILTER_HOME/training/spam/
chmod 750 $SA_MILTER_HOME/training/ham/
chmod 750 $SA_MILTER_HOME/training/spam/
chmod 640 $SA_MILTER_HOME/training/ham/*.eml
chmod 640 $SA_MILTER_HOME/training/spam/*.eml
# Duplikate in beiden Ordnern entfernen
/usr/bin/fdupes -r -f $SA_MILTER_HOME/training/ham/ | grep -v '^$' | xargs rm -v 2> /dev/null /usr/bin/fdupes -r -f $SA_MILTER_HOME/training/spam/ | grep -v '^$' | xargs rm -v 2> /dev/null
# Worker-Script als Milter-User ausfuehren
/usr/bin/su -c "$SA_MILTER_HOME/training/learn.sh $1" $SA_MILTER_USER

[root@mail-gw:~]$ cat /var/lib/spamass-milter/training/learn.sh
#!/usr/bin/bash
SA_MILTER_HOME="/var/lib/spamass-milter"
SA_MILTER_USER="sa-milt"
if test `whoami` = "$SA_MILTER_USER"
then
 /bin/echo "" > /dev/null
else
/bin/echo "Das Script 'learn.sh' muss als Benutzer '$SA_MILTER_USER' aufgerufen werden"
 exit
fi
cd $SA_MILTER_HOME
SHOW_HELP="0"
if [ "$1" == "rebuild" ] || [ "$1" == "" ] || [ `echo $((($1*2)/2))` == "$1" ]; then
 # Kompletter Rebuild angefordert
 if [ "$1" == "rebuild" ]; then
  # Bayes-Reset
  /usr/bin/sa-learn --clear
  # SPAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite SPAM Samples"
nice -n 19 /usr/bin/sa-learn --progress --spam $SA_MILTER_HOME/training/spam/*.eml
  echo ""
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite HAM Samples"
nice -n 19 /usr/bin/sa-learn --progress --ham $SA_MILTER_HOME/training/ham/*.eml
  echo ""
 else
  # Default auf aktuellen Tag oder Parameter
  if [ "$1" == "" ]; then
   TRAIN_DAYS="1"
  else
   TRAIN_DAYS="$1"
  fi
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite SPAM Samples"
nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/spam/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --spam
  echo ""
  # HAM-Training
  MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
  echo "$MY_TIME: Verarbeite HAM Samples"
nice -n 19 /usr/bin/find $SA_MILTER_HOME/training/ham/ -type f -name \*.eml -mtime -$TRAIN_DAYS | xargs -r /usr/bin/sa-learn --ham
  echo ""
 fi
else
 SHOW_HELP="1"
fi
if [ "$1" == "--help" ] || [ "$1" == "-h" ] || [ "$SHOW_HELP" == "1" ]; then
 echo "Bayes-Maintaining-Skript"
 echo "Usage:"
echo " rebuild: Bayes komplett zuruecksetzen und anhand der Samples neu aufbauen"
 echo "  <days>:  Alter der zu trainierenden Samples in Tagen (Default: 1)"
 exit
fi
MY_TIME=$(/usr/bin/date "+%d-%m-%Y %H:%M:%S")
echo "$MY_TIME: Done"
echo ""
nice -n 19 /usr/bin/sa-learn --dump magic
echo ""
/usr/bin/ls -l -h --time-style=long-is $SA_MILTER_HOME/.spamassassin/

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to