PDFBox version 0.6 is quite old and there have been many improvements,
you should look at moving to the newest version 0.7.3, although from the
description of your problem it probably would not resolve it.
If there are a large number of temp files with "pdfbox" in the name then
you are most likely not calling close() on the PDDocument object. How
are you adding the documents to the index. There is a simple helper
class called org.pdfbox.searchengine.lucene.LucenePDFDocment that you
may find useful.
Ben
Ariel Isaac Romero Cartaya wrote:
Hi every body:
I am getting a problem during the indexing process, I am indexing big
amounts of texts most of them in pdf format I am using pdf box 0.6
version.
The space in hard disk before that the indexing process begin is
around 120
Gb but incredibly even when my lucene index doesn't have yet 300 mb my
hard
disk has not already free space, more incredible is that when I turn
off the
process of indexing then the free disk space arise rapidly to 120 Gb. How
could happen this if I doesn't copy the documents to the disk ??? , I
have a
linux machine for the indexing process, I have been thinking that
could be
the temporaly files of something , may be pdf box ???
Could you help me please ???
Greetings
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]