2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
> It sounds like you might have some thread synchronization issues outside of
> Lucene. To simplify things a bit, you might try just using one IndexWriter.
> If I remember right, the IndexWriter is now pretty efficient, and there
> isn't much need to index to smaller indexes and then merge. There is a lot
> of juggling to get wrong with that approach.

While I agree it is easier to have a single IndexWriter, if you have
multiple cores you will get significant speed-ups with multiple
IndexWriters, even with the impact of merging at the end.
#IndexWriters = # physical cores is an reasonable rule of thumb.

General speed-up estimate: # cores * 0.6 - 0.8  over single IndexWriter
YMMV

When I get around to it, I'll re-run my tests varying the # of
IndexWriters & post.

-Glen

>
> - Mark
>
> Sudarsan, Sithu D. wrote:
>>
>> Hi,
>>
>> We are trying to index large collection of PDF documents, sizes varying
>> from few KB to few GB.  Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for
>> text extraction) and on Windows as well as CentOS Linux. Used java -Xms
>> and -Xmx options, both at 1080m, even though we have 4GB on Windows and
>> 32 GB on Linux with sufficient swap space.
>>
>> With just one thread, though it takes time, the indexing happens. To
>> speed up, we tried multi-threaded approach with one Indexwriter for each
>> thread. After all the threads finish their indexing, they are merged.
>> With about 100 sample files and 10 threads, the program works pretty
>> well and it does speed up. But, when we run on document collection of
>> about 25GB, couple of threads just hang, while the rest have completed
>> their indexing. The program never gracefully exits, and the threads that
>> seem to have died ensure that the final index merging does not take
>> place. The program needs to be manually terminated.
>> Tried both with simple analyzer as well as standard analyzer, with
>> similar results.
>>
>> Any useful tips / solutions welcome.
>>
>> Thanks in advance,
>> Sithu Sudarsan
>> Graduate Research Assistant, UALR
>> & Visiting Researcher, CDRH/OSEL
>>
>> [EMAIL PROTECTED]
>> [EMAIL PROTECTED]
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to