> On Jan 5, 2017, at 8:54 AM, Dave Funk <dbf...@engineering.uiowa.edu> wrote:
> 
> On Thu, 5 Jan 2017, Nicola Piazzi wrote:
> 
>> Each minute it learn messages of the last minute so it read and learn one 
>> time only for each message
>> Messages are that it sends from internal, so il learn that words are not spam
>> 
>> Internal messages are not spam
> 
> Until one of your users gets their account hacked/phished and spammers then 
> use it to abuse your server to send out megabytes of spam.
> (or they may have had an account on Yahoo that used the same password).
> 
> Careless users happen to the best of us. ;(
> 
> John's point is still valid; blind un-vetted automated Bayes learning is 
> asking for trouble.

I would have to agree and re-inforce the message here... automated learning of 
SPAM/HAM is not a good idea. I have users dropping emails THEY HAVE SUBSCRIBED 
TO and forgotten they did so in their SPAM folder, and I would argue those are 
NOT SPAM. They actually contain a LOT of industry standard nomenclature that if 
trained as SPAM would not necessarily be valid tokens.

Think about it, the best machine to tell whether something is SPAM or not is 
the human machine. learning in this regard is telling SA emails like this one 
that I have specifically identified as SPAM are ones you should look out for. 
It (in and of itself) does not make a judgement call on what is or is not SPAM. 
You need to do that. 

Keep teaching and pretty soon everything is in every pool (there is such a 
thing as knowing too much, so much so, that you are left indecisive and 
perplexed at event the simplest problem). I think it's far better to have a 
smaller pool of tokens keyed with precision than a lot of tokens that well 
frankly could go either way.



> 
> -- 
> Dave Funk                                  University of Iowa
> <dbfunk (at) engineering.uiowa.edu>        College of Engineering
> 319/335-5751   FAX: 319/384-0549           1256 Seamans Center
> Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
> #include <std_disclaimer.h>
> Better is not better, 'standard' is better. B{

Reply via email to