Srilatha wrote: > Hi, > > I am trying understand the usage of journal in Bayesian Filtering. > > If bayes_learn_to_journal is set to 1, SA stores newly learnt tokens > in the journal. Correct. > > > When bayesian filter is activated, while scanning a message > SA reads tokens from BOTH 'bayes_tokens' database and 'bayes_journel' No, it only reads bayes_tokens.
If it read bayes_journal while scanning, it would defeat the purpose of the journal. The journal exits to be more readily writable. This is possible only because it is rarely read from. If you read from the journal during scans, the write lock wouldn't be any more available than the write lock for the main tokens database, so you might as well use that for all your writes. Data is merged from the journal into the tokens at regular intervals as a part of SA's automatic sync process (once a day), when you run sa-learn --sync, or sa-learn --force-expire. This in general means data in the journal doesn't "go live" until a sync kicks off. This is why bayes_learn_to_journal defaults to 0. It improves learning performance, but also introduces a "lag" where the results don't take effect until there's a sync. > > While scanning a message, tokens found in bayes_tokens database are > written to bayes_journel with modified timestamp Correct. Timestamp updates are always written to the journal, largely because they're only relevant during expiry scans, and SA always does a sync before it scans for expiry. There's no sense holding up scanners in order to update timestamps, as it has no affect at all on the scan results, so dumping it into the journal is ideal. > > > Is my understanding correct ? > Please correct me if my understanding is wrong Corrected where appropriate.