> On Wed, 3 Aug 2011 00:55:51 +0200
> "Giampaolo Tomassoni" <giampa...@tomassoni.biz> wrote:
>
>> SA not only reads tokens from bayes: it also inserts them and (even
>> worse) updates their ham/spam occurrence counters.
>
> This is why (IMO) you should journal Bayes updates and run them in
> batches periodically, and you should use a non-locking form of update
> such as MVCC as provided by PostgreSQL or rewritten-and-renamed
> database files.  If you have only one process doing the update, it
> makes locking much simpler because you only ever have one writer at a
> time.

I think your suggestion is an overkill. Iff I'm right about the problem
being on transaction competition, it can be solved by a much easier
solution. The way bayes uses and updates tokens data would allow reading
in a transaction and writing in a new one: you always increase tokens
occurrences by a value which doesn't depend on the value they had at read
time.

The problem is eventually in the latency between the read and the commit...


>
> Regards,
>
> David.
>


Reply via email to