> On Wed, 3 Aug 2011 00:55:51 +0200 > "Giampaolo Tomassoni" <giampa...@tomassoni.biz> wrote: > >> SA not only reads tokens from bayes: it also inserts them and (even >> worse) updates their ham/spam occurrence counters. > > This is why (IMO) you should journal Bayes updates and run them in > batches periodically, and you should use a non-locking form of update > such as MVCC as provided by PostgreSQL or rewritten-and-renamed > database files. If you have only one process doing the update, it > makes locking much simpler because you only ever have one writer at a > time.
I think your suggestion is an overkill. Iff I'm right about the problem being on transaction competition, it can be solved by a much easier solution. The way bayes uses and updates tokens data would allow reading in a transaction and writing in a new one: you always increase tokens occurrences by a value which doesn't depend on the value they had at read time. The problem is eventually in the latency between the read and the commit... > > Regards, > > David. >