PDFBox version 0.6 is quite old and there have been many improvements, you should look at moving to the newest version 0.7.3, although from the description of your problem it probably would not resolve it.

If there are a large number of temp files with "pdfbox" in the name then you are most likely not calling close() on the PDDocument object. How are you adding the documents to the index. There is a simple helper class called org.pdfbox.searchengine.lucene.LucenePDFDocment that you may find useful.

Ben


Ariel Isaac Romero Cartaya wrote:
Hi every body:

  I am getting a problem during the indexing process, I am indexing big
amounts of texts most of them in pdf format I am using pdf box 0.6 version. The space in hard disk before that the indexing process begin is around 120 Gb but incredibly even when my lucene index doesn't have yet 300 mb my hard disk has not already free space, more incredible is that when I turn off the
process of indexing then the free disk space arise rapidly to 120 Gb. How
could happen this if I doesn't copy the documents to the disk ??? , I have a linux machine for the indexing process, I have been thinking that could be
the temporaly files of something , may be pdf box ???
Could you help me please ???
Greetings



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to