Hi,

I have a multi-threaded indexing application that indexes documents into a set of Lucene index databases (I have millions of documents to index, hence the split DB) . When a thread gets an index request, it determines the index DB to index the data in. It grabs the IndexWriter for that database.

My question is: If I have several threads that want to index some data for the same DB concurrently and also have threads that will be wanting to delete documents and searchers too. Does anyone know the benefits and drawbacks of the following approaches with respect to the performance characteristics of the Lucene internals

a) Serialisation of writes i.e. multiple IndexWriter.close(). Each thread blocks waiting for the writer and does

new IndexWriter()
addDocuments()
close IndexWriter

for each thread or

b) Parallelisation of writes with a single IndexWriter.close(). Allow all threads to share the same IndexWriter instance. LIA says that IndexWriter is thread-safe between several threads. So, the first thread requesting the writer just creates a new instance, all subsequent threads just add documents to the same instance with the last user closing the writer, e.g.

First thread - new IndexWriter()
2..n threads - inc use_count +´get existing IndexWriter
all threads - addDocuments()
n..2 threads - dec use_count
Last thread - close IndexWriter

The middle 3 steps will of course happen in random order, not as defined above.

Thanks
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to