Sudarsan, Sithu D. wrote:


Hi Glen, Mike, Grant & Mark

Thank you for the quick responses.

1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample code
to improve the multi-threaded code.

2. We'll try using as many Indexwriters as the number of cores, first
(which is 2cpu x 4 core = 8).

You could also try multiple threads against a single IndexWriter. It's simpler, and you don't have to merge indices in the end. It'd be great if you could post back on net throughput because I'd really like to understand if there is some sort of thread issue sharing a single IndexWriter.

3. Yes, PDFBox exceptions have been independently checked. We've a
prototype module to check PDF files that contain errors. Generally they
are few, less than 1% of the total number of files. The PDFs all have
been OCRed. Also, if any throws exceptions then they are quarantined in
a separate folder for further analysis to have a look at the document
itself.

4. We've tried using larger JVM space by defining -Xms1800m and
-Xmx1800m, but it runs out of memory. Only -Xms1080m and -Xmx1080m seems
stable. That is strange as we have 32 GB of RAM and 34GB swap space.
Typically no other application is running. However, the CentOS version
is 32 bit. The Ungava project seems to be using 64 bit.

5. -QUIT option for Linux does throw stack trace, but after few threads
it hangs. Don't know why. Need to look at that.

Can you post the stack traces that you did see? (Do you think those threads are hung?)

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to