Re: Bayes, Manual and Auto Learning Strategies

Steve Bergman Wed, 02 Jul 2014 01:25:25 -0700


On 07/02/2014 02:39 AM, Dave Funk wrote:

Steve,
For some reason you seem to be hung-up on Bayes "autolearning".


Skip down the thread. I was demonstrated to be wrong. :-)

It it possible that you're confusing it with "Auto-White listing"? (which is now
deprecated and has -nothing- to do with Bayes).

No. I know the difference. AWL, planned to be replaced with TxRep andall that. (I'd mention that TxRep has problems, but it's too late atnight for me to engage in yet another argument.)


SA's Bayesian scorer is a system based upon a method that parses a
message, extracts 'tokens' from it and uses an algorithm to calculate a
score for the message based upon a dictionary of previously seen tokens
and their relative merit.


Yeah. Bayesian statistics is pretty cool.

or via an automated process from within SA as it scores messages
(known as 'auto' learning). So regardless of whether manual or auto
learning is utilized, tokens are added to the dictionary.

See, that's where things stop making sense to me. I would not expect theBayesian filter to do any better than it's training. And if it'straining is via input from static rules (plus DNSBL's and DCC's) I wouldnot expect it to be able to do any better. And it's not hard to imaginepathological behavior developing. But people are telling me different.And I'm open to considering alternative possibilities.

It's also
possible to employ both auto & manual learning methods in the same
installation.


That would be the scenario I am considering.

There can be one dictionary used for scoring all messages processed (called
"site wide Bayes") or many separate dictionaries, one used for each
recognized user ("per user Bayes"). Either way, the dictionary(s) need to
be updated (and the update process could be either manual, auto, or both).

Yes. I've been devoted to individual fileDB's, each individually trainedfor a particular user's spam^Wemail stream. People are telling me thatsystem-wide databases work well.

It's been this way for the past 10+ years AFAIK (well, maybe 10 years
ago it didn't have as many options for back-end database storage, mostly
limited to Berkeley-DB type methods).

I think it was around 2003, in SA 2.5(?) that SA got a Bayesianclassifier. IIRC, there was a project called dspam (which I think isstill around) For a while the dspam guys were pushing the fact that*dspam* was a modern spam filter, and SA was old, clunky, and toooutdated to use.

Anyway, in the very early versions of SA Bayes, everything wassystem-wide. Later they added the option to use individual user files.And the only info I've seen that described autolearn and how it workedwas a mailing list post from 2004 which specifically stated that it wassystem-wide, in memory, and was lost upon restart. Maybe that's correctand maybe it's not.

But today, it looks to be user-specific, if configured that way. I'mstill working out whether I want to use it, and if so, how.


-Steve

Re: Bayes, Manual and Auto Learning Strategies

Reply via email to