Re: Bayes, Manual and Auto Learning Strategies

Dave Funk Wed, 02 Jul 2014 00:40:27 -0700

On Wed, 2 Jul 2014, Steve Bergman wrote:

On 07/01/2014 11:49 PM, Karsten Bräckelmann wrote:
Those do not tell you about using file or SQL based databases?
They do. But not specifically with respect to autolearn.

You never
thought about googling for "spamassassin per user" and friends? You
never checked the SA wiki?
I have, indeed. No reference to autolearn and persistent storage. The lack ofmention is notable.
I'd expect people to be lining up to tell me I'm mistaken if I absolutelywere.
Can you point me to a change log somewhere documenting autolearn moving fromin-memory and system-wide to per user and persistent?
I don't hold a strong opinion on this. It would be nice if I were wrong. Itwould open more options.
I'm just waiting for evidence that it's the case. My perception is that It'snot.
-Steve


Steve,
For some reason you seem to be hung-up on Bayes "autolearning". It it
possible that you're confusing it with "Auto-White listing"? (which is now
deprecated and has -nothing- to do with Bayes).

SA's Bayesian scorer is a system based upon a method that parses a
message, extracts 'tokens' from it and uses an algorithm to calculate a
score for the message based upon a dictionary of previously seen tokens
and their relative merit.

The dictionary is created and updated by a process called 'learning'
wherein already-classified messages are tokenized and their tokens are
stored in the dictionary along with a merit value derived from their
instance count and a factor taken from being classified as spam or ham.
This learning process can be either externally driven (known as 'manual'
learning) or via an automated process from within SA as it scores messages
(known as 'auto' learning). So regardless of whether manual or auto
learning is utilized, tokens are added to the dictionary. It's also
possible to employ both auto & manual learning methods in the same
installation.

There can be one dictionary used for scoring all messages processed (called
"site wide Bayes") or many separate dictionaries, one used for each
recognized user ("per user Bayes"). Either way, the dictionary(s) need to
be updated (and the update process could be either manual, auto, or both).

The Bayes dictionary(s) need to be stored some how, the usual method is
via some kind of database. It could be a simple file based DB, some kind
of fancy SQL server based system or something else. This is a DBA'ish kind
of choice as to what particular technology is used to store the
dictionary DB. (usually on disk in some way, could be in some kind of
memory resident set of tables, or something else???).

So you have a multi-dimensional matrix WRT your Bayes system
configuration, and manual VS auto learning is just one factor.

It's been this way for the past 10+ years AFAIK (well, maybe 10 years
ago it didn't have as many options for back-end database storage, mostly
limited to Berkeley-DB type methods).

I hope this helps you.


--
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Re: Bayes, Manual and Auto Learning Strategies

Reply via email to