On 07/02/2014 02:39 AM, Dave Funk wrote:

Steve,
For some reason you seem to be hung-up on Bayes "autolearning".

Skip down the thread. I was demonstrated to be wrong. :-)


It it possible that you're confusing it with "Auto-White listing"? (which is now
deprecated and has -nothing- to do with Bayes).

No. I know the difference. AWL, planned to be replaced with TxRep and all that. (I'd mention that TxRep has problems, but it's too late at night for me to engage in yet another argument.)


SA's Bayesian scorer is a system based upon a method that parses a
message, extracts 'tokens' from it and uses an algorithm to calculate a
score for the message based upon a dictionary of previously seen tokens
and their relative merit.

Yeah. Bayesian statistics is pretty cool.

or via an automated process from within SA as it scores messages
(known as 'auto' learning). So regardless of whether manual or auto
learning is utilized, tokens are added to the dictionary.

See, that's where things stop making sense to me. I would not expect the Bayesian filter to do any better than it's training. And if it's training is via input from static rules (plus DNSBL's and DCC's) I would not expect it to be able to do any better. And it's not hard to imagine pathological behavior developing. But people are telling me different. And I'm open to considering alternative possibilities.

It's also
possible to employ both auto & manual learning methods in the same
installation.

That would be the scenario I am considering.

There can be one dictionary used for scoring all messages processed (called
"site wide Bayes") or many separate dictionaries, one used for each
recognized user ("per user Bayes"). Either way, the dictionary(s) need to
be updated (and the update process could be either manual, auto, or both).

Yes. I've been devoted to individual fileDB's, each individually trained for a particular user's spam^Wemail stream. People are telling me that system-wide databases work well.

It's been this way for the past 10+ years AFAIK (well, maybe 10 years
ago it didn't have as many options for back-end database storage, mostly
limited to Berkeley-DB type methods).


I think it was around 2003, in SA 2.5(?) that SA got a Bayesian classifier. IIRC, there was a project called dspam (which I think is still around) For a while the dspam guys were pushing the fact that *dspam* was a modern spam filter, and SA was old, clunky, and too outdated to use.

Anyway, in the very early versions of SA Bayes, everything was system-wide. Later they added the option to use individual user files. And the only info I've seen that described autolearn and how it worked was a mailing list post from 2004 which specifically stated that it was system-wide, in memory, and was lost upon restart. Maybe that's correct and maybe it's not.

But today, it looks to be user-specific, if configured that way. I'm still working out whether I want to use it, and if so, how.

-Steve

Reply via email to