> On Jan 5, 2017, at 8:54 AM, Dave Funk <dbf...@engineering.uiowa.edu> wrote: > > On Thu, 5 Jan 2017, Nicola Piazzi wrote: > >> Each minute it learn messages of the last minute so it read and learn one >> time only for each message >> Messages are that it sends from internal, so il learn that words are not spam >> >> Internal messages are not spam > > Until one of your users gets their account hacked/phished and spammers then > use it to abuse your server to send out megabytes of spam. > (or they may have had an account on Yahoo that used the same password). > > Careless users happen to the best of us. ;( > > John's point is still valid; blind un-vetted automated Bayes learning is > asking for trouble.
I would have to agree and re-inforce the message here... automated learning of SPAM/HAM is not a good idea. I have users dropping emails THEY HAVE SUBSCRIBED TO and forgotten they did so in their SPAM folder, and I would argue those are NOT SPAM. They actually contain a LOT of industry standard nomenclature that if trained as SPAM would not necessarily be valid tokens. Think about it, the best machine to tell whether something is SPAM or not is the human machine. learning in this regard is telling SA emails like this one that I have specifically identified as SPAM are ones you should look out for. It (in and of itself) does not make a judgement call on what is or is not SPAM. You need to do that. Keep teaching and pretty soon everything is in every pool (there is such a thing as knowing too much, so much so, that you are left indecisive and perplexed at event the simplest problem). I think it's far better to have a smaller pool of tokens keyed with precision than a lot of tokens that well frankly could go either way. > > -- > Dave Funk University of Iowa > <dbfunk (at) engineering.uiowa.edu> College of Engineering > 319/335-5751 FAX: 319/384-0549 1256 Seamans Center > Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 > #include <std_disclaimer.h> > Better is not better, 'standard' is better. B{