You may want to use something like pdftotext part of XPDF
(http://www.foolabs.com/xpdf/download.html). It will produce a text
extract for a PDF. Indexing will work like a breeze, without memory
consumption of PDFBox.
Regards,
Ioan
spinergywmy wrote:
Hi,
I having this indexing the pdf file performance issue. It took me more
than 10 sec to index a pdf file about 200kb. Is it because I only have a
segment file? How can I make the indexing performance better?
Thanks
regards,
Wooi Meng
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]