Hey devs,

I am looking at making use of the two-phase commit approach available
in IndexWriter but the current architecture there does not quite fit
with what we want to achieve. It seems that I could build this atop
IndexWriter, however I wonder if there is either an existing
alternative that I have not discovered or whether it would be better
to contribute a patch to the Lucene project itself?

In my system I have many concurrent transactions and each of them
needs to make modifications to a single Lucene index in an atomic and
consistent manner.
If I had a single-thread (i.e. one concurrent transaction) for
example, I could access the IndexWriter instance, call addDocument or
whatever as many times as I like, call prepareForCommit and then
commit.

The main issue that I have is that if I let all concurrent
transactions use the same IndexWriter then I loose isolation, as a
commit of one transaction may write the partial pending updates of
another transaction.

Now I can see a naive solution for my application where I could add
all updates that I want to make to the index to a `pending list`, I
could then take an exclusive lock for the index writer, apply my
pending list to the index writer and then commit, finally releasing
the exclusive lock. Whilst I could get this working, the down-side is
that I have to implement and manage this `pending list` and applying
it to the index myself, and it comes at the cost of memory (or even
paging it to disk).

It seems likely to me that others before me have also had such a
requirement, does anything like this already exist in Lucene or would
it be desirable for me to contribute something? At a rough guess I
would imagine separating the IndexWriter from transaction content,
something like:

try(Transaction txn = indexWriter.beginTransaction()) {
   indexWriter.addDocument(txn, doc1);
   indexWriter.addDocument(txn, doc2);
   indexWriter.addDocument(txn, doc3);

  indexWriter.commit(txn);
}

The transaction could be automatically rolled-back in the close()
method called by try-with-resources if it has not been committed,
which would allow any exceptions to be cleanly handled by the caller.

Does that make any sense, or am I way off?

Cheers Adam.

-- 
Adam Retter

skype: adam.retter
tweet: adamretter
http://www.adamretter.org.uk

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to