Re: R: learn ham

Bill Shirley Fri, 06 Jan 2017 11:43:40 -0800


On 1/6/2017 6:36 AM, Marc Stürmer wrote:

Am 05.01.2017 um 17:38 schrieb Nicola Piazzi:
Each minute it learn messages of the last minute so it read and learn one time 
only for each message
Messages are that it sends from internal, so il learn that words are not spam

Internal messages are not spam
You'll never know if internal messages are ever spam or not; your script is the 
best way to

a) poison your bayes database through unsupervised user interaction and
b) put unneccessary load on your server.

Bayes is just one of the many factors Spamassassin takes into account for 
computing the spam score.
Usually you would first train your Spamassassin on a good ham and spam corpus with enough messages that after thatSpamassassin's autolearn feature is enabled - I guess the threshold is 200 messages for each category before autolearn willstart to work.
After that you normally only would train Spamassassin on its errors from time to time, nothing more, nothing less, maybe oncea week or though because a well trained and maintained Spamassassin behaves well enough and doesn't need more maintenance thanthis. You can even make it more comfortable by using stuff like the antispam plugin from Dovecot if you want to.
BTW, this line in the script is a security nightmare:

mysql -N -u root –p<mypwd> -D mailscanner
This means that any user on the machine is able to read the root password with the means of using ps, e.g. "ps max | less",when its running. Not good at all.

Set up /root/.my.cnf:
[client]
user=root
password=SuperDuperSecret


Then use the HOME trick (from my crontab):
HOME=/root mysqldump --opt --order-by-primary -f -R -r 
/home/webmaster/mysql.backups/pkg_phpmyadmin.sql pkg_phpmyadmin

Bill

Re: R: learn ham

Reply via email to