Sudarsan, Sithu D. wrote:
Hi Glen, Mike, Grant & Mark
Thank you for the quick responses.
1. Yes, I'm looking now at ThreadPoolExecutor. Looking for a sample
code
to improve the multi-threaded code.
2. We'll try using as many Indexwriters as the number of cores, first
(which is 2cpu x 4 core = 8).
You could also try multiple threads against a single IndexWriter.
It's simpler, and you don't have to merge indices in the end. It'd be
great if you could post back on net throughput because I'd really like
to understand if there is some sort of thread issue sharing a single
IndexWriter.
3. Yes, PDFBox exceptions have been independently checked. We've a
prototype module to check PDF files that contain errors. Generally
they
are few, less than 1% of the total number of files. The PDFs all have
been OCRed. Also, if any throws exceptions then they are quarantined
in
a separate folder for further analysis to have a look at the document
itself.
4. We've tried using larger JVM space by defining -Xms1800m and
-Xmx1800m, but it runs out of memory. Only -Xms1080m and -Xmx1080m
seems
stable. That is strange as we have 32 GB of RAM and 34GB swap space.
Typically no other application is running. However, the CentOS version
is 32 bit. The Ungava project seems to be using 64 bit.
5. -QUIT option for Linux does throw stack trace, but after few
threads
it hangs. Don't know why. Need to look at that.
Can you post the stack traces that you did see? (Do you think those
threads are hung?)
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]