Re: Usage of journal in Bayesian Filtering.

Matt Kettler Thu, 30 Aug 2007 06:48:05 -0700

Srilatha wrote:
> Hi,
>
> I am trying understand the usage of journal in Bayesian Filtering.
>
> If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens
> in the journal.
Correct.
>
>
> When bayesian filter is activated, while scanning a message
> SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel'
No, it only reads bayes_tokens.


 If it read bayes_journal while scanning, it would defeat the purpose of
the journal.

The journal exits to be more readily writable. This is possible only
because it is rarely read from. If you read from the journal during
scans, the write lock wouldn't be any more available than the write lock
for the main tokens database, so you might as well use that for all your
writes.

Data is merged from the journal into the tokens at regular intervals as
a part of SA's automatic sync process (once a day), when you run
sa-learn --sync, or sa-learn --force-expire.

This in general means data in the journal doesn't "go live" until a sync
kicks off. This is why bayes_learn_to_journal defaults to 0. It improves
learning performance, but also introduces a "lag" where the results
don't take effect until there's a sync.
>
> While scanning a message, tokens found in bayes_tokens database are
> written to bayes_journel with modified timestamp
Correct. Timestamp updates are always written to the journal, largely
because they're only relevant during expiry scans, and SA always does a
sync before it scans for expiry. There's no sense holding up scanners in
order to update timestamps, as it has no affect at all on the scan
results, so dumping it into the journal is ideal.
>
>
> Is my understanding correct ?
> Please correct me if my understanding is wrong 
Corrected where appropriate.

Re: Usage of journal in Bayesian Filtering.

Reply via email to