Glen Newton wrote:
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
It sounds like you might have some thread synchronization issues outside of
Lucene. To simplify things a bit, you might try just using one IndexWriter.
If I remember right, the IndexWriter is now pretty efficient, and there
isn't much need to index to smaller indexes and then merge. There is a lot
of juggling to get wrong with that approach.

While I agree it is easier to have a single IndexWriter, if you have
multiple cores you will get significant speed-ups with multiple
IndexWriters, even with the impact of merging at the end.
#IndexWriters = # physical cores is an reasonable rule of thumb.

General speed-up estimate: # cores * 0.6 - 0.8  over single IndexWriter
YMMV

When I get around to it, I'll re-run my tests varying the # of
IndexWriters & post.

-Glen
Hey Mr McCandless, whats up with that? Can IndexWriter be made to be as efficient as using Multiple Writers? Where do you suppose the hold up is? Number of threads doing merges? Sync contention? I hate the idea of multiple IndexWriter/Readers being more efficient than a single instance. In an ideal Lucene world, a single instance would hide the complexity and use the number of threads needed to match multiple instance performance.
- Mark

Sudarsan, Sithu D. wrote:
Hi,

We are trying to index large collection of PDF documents, sizes varying
from few KB to few GB.  Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for
text extraction) and on Windows as well as CentOS Linux. Used java -Xms
and -Xmx options, both at 1080m, even though we have 4GB on Windows and
32 GB on Linux with sufficient swap space.

With just one thread, though it takes time, the indexing happens. To
speed up, we tried multi-threaded approach with one Indexwriter for each
thread. After all the threads finish their indexing, they are merged.
With about 100 sample files and 10 threads, the program works pretty
well and it does speed up. But, when we run on document collection of
about 25GB, couple of threads just hang, while the rest have completed
their indexing. The program never gracefully exits, and the threads that
seem to have died ensure that the final index merging does not take
place. The program needs to be manually terminated.
Tried both with simple analyzer as well as standard analyzer, with
similar results.

Any useful tips / solutions welcome.

Thanks in advance,
Sithu Sudarsan
Graduate Research Assistant, UALR
& Visiting Researcher, CDRH/OSEL

[EMAIL PROTECTED]
[EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to