2008/10/23 Mark Miller <[EMAIL PROTECTED]>: > It sounds like you might have some thread synchronization issues outside of > Lucene. To simplify things a bit, you might try just using one IndexWriter. > If I remember right, the IndexWriter is now pretty efficient, and there > isn't much need to index to smaller indexes and then merge. There is a lot > of juggling to get wrong with that approach.
While I agree it is easier to have a single IndexWriter, if you have multiple cores you will get significant speed-ups with multiple IndexWriters, even with the impact of merging at the end. #IndexWriters = # physical cores is an reasonable rule of thumb. General speed-up estimate: # cores * 0.6 - 0.8 over single IndexWriter YMMV When I get around to it, I'll re-run my tests varying the # of IndexWriters & post. -Glen > > - Mark > > Sudarsan, Sithu D. wrote: >> >> Hi, >> >> We are trying to index large collection of PDF documents, sizes varying >> from few KB to few GB. Lucene 2.3.2 with jdk 1.6.0_01 (with PDFBox for >> text extraction) and on Windows as well as CentOS Linux. Used java -Xms >> and -Xmx options, both at 1080m, even though we have 4GB on Windows and >> 32 GB on Linux with sufficient swap space. >> >> With just one thread, though it takes time, the indexing happens. To >> speed up, we tried multi-threaded approach with one Indexwriter for each >> thread. After all the threads finish their indexing, they are merged. >> With about 100 sample files and 10 threads, the program works pretty >> well and it does speed up. But, when we run on document collection of >> about 25GB, couple of threads just hang, while the rest have completed >> their indexing. The program never gracefully exits, and the threads that >> seem to have died ensure that the final index merging does not take >> place. The program needs to be manually terminated. >> Tried both with simple analyzer as well as standard analyzer, with >> similar results. >> >> Any useful tips / solutions welcome. >> >> Thanks in advance, >> Sithu Sudarsan >> Graduate Research Assistant, UALR >> & Visiting Researcher, CDRH/OSEL >> >> [EMAIL PROTECTED] >> [EMAIL PROTECTED] >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- - --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]