Re: Transactions multiplexing IndexWriter

Michael McCandless Tue, 16 Dec 2014 11:08:20 -0800

Lucene's IndexWriter only allows one transaction at a time.  Fixing
this would be challenging I think.


One workaround might be to let your separate transactions write into
private directories, and then when complete, use
IndexWriter.addIndexes (on the main writer) to fold those changes it.
That part would still be single-transaction, but the addIndexes call
should be faster than doing N separate indexing ops.

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 16, 2014 at 8:54 AM, Adam Retter <[email protected]> wrote:
> Hey devs,
>
> I am looking at making use of the two-phase commit approach available
> in IndexWriter but the current architecture there does not quite fit
> with what we want to achieve. It seems that I could build this atop
> IndexWriter, however I wonder if there is either an existing
> alternative that I have not discovered or whether it would be better
> to contribute a patch to the Lucene project itself?
>
> In my system I have many concurrent transactions and each of them
> needs to make modifications to a single Lucene index in an atomic and
> consistent manner.
> If I had a single-thread (i.e. one concurrent transaction) for
> example, I could access the IndexWriter instance, call addDocument or
> whatever as many times as I like, call prepareForCommit and then
> commit.
>
> The main issue that I have is that if I let all concurrent
> transactions use the same IndexWriter then I loose isolation, as a
> commit of one transaction may write the partial pending updates of
> another transaction.
>
> Now I can see a naive solution for my application where I could add
> all updates that I want to make to the index to a `pending list`, I
> could then take an exclusive lock for the index writer, apply my
> pending list to the index writer and then commit, finally releasing
> the exclusive lock. Whilst I could get this working, the down-side is
> that I have to implement and manage this `pending list` and applying
> it to the index myself, and it comes at the cost of memory (or even
> paging it to disk).
>
> It seems likely to me that others before me have also had such a
> requirement, does anything like this already exist in Lucene or would
> it be desirable for me to contribute something? At a rough guess I
> would imagine separating the IndexWriter from transaction content,
> something like:
>
> try(Transaction txn = indexWriter.beginTransaction()) {
>    indexWriter.addDocument(txn, doc1);
>    indexWriter.addDocument(txn, doc2);
>    indexWriter.addDocument(txn, doc3);
>
>   indexWriter.commit(txn);
> }
>
> The transaction could be automatically rolled-back in the close()
> method called by try-with-resources if it has not been committed,
> which would allow any exceptions to be cleanly handled by the caller.
>
> Does that make any sense, or am I way off?
>
> Cheers Adam.
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Transactions multiplexing IndexWriter

Reply via email to