Roy Klein wrote:

Here's the scenario that I can't guarantee won't happen:

There might be 3 transactions in a very short time span (for example, 1
second), here's what they are:

1) update doc1 (DEL doc1, ADD doc1)
2) update doc2 (DEL doc2, ADD doc2)
3) delete doc1

If I process these in order, then at the end of the 3 transactions, my index
will only have one document in it, doc2.


When I did something similar a while back(*) I added a little intelligence to the insertion of jobs into the queue. Essentially, if I'm about to update a document that's already scheduled for updating, don't update it twice (delete the first job, ignore the new job, which ever is appropriate). Similarly, if I'm going to delete a document that's awaiting update, don't bother with the update (but let the delete in so that the old document is deleted. You can handle new documents and updates to them in a similar vein.

For your particular example, the first job would be discarded when the third job is added to the queue.

jch


(*) Updates to a folder in an IMAP server if you're interested. Although I'm dealing with only a few bytes at a time, the time-based batch is much the same, with the time being dictated by the IMAP client rather than any sort of clock.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to