Ariel Isaac Romero Cartaya wrote:
Hi every body:
I am getting a problem during the indexing process, I am indexing big
amounts of texts most of them in pdf format I am using pdf box 0.6 version.
The space in hard disk before that the indexing process begin is around 120
Gb but incredibly even
Here is my source code where I convert pdf files to text for indexing, I
got this source code from lucene in action examples and adapted it for my
convenience, I hop you could help me to fix this problem, anyway if you know
another more efficient way to do it please tell me how to:
import java.i
PDFBox version 0.6 is quite old and there have been many improvements,
you should look at moving to the newest version 0.7.3, although from the
description of your problem it probably would not resolve it.
If there are a large number of temp files with "pdfbox" in the name then
you are most li
Hi every body:
I am getting a problem during the indexing process, I am indexing big
amounts of texts most of them in pdf format I am using pdf box 0.6 version.
The space in hard disk before that the indexing process begin is around 120
Gb but incredibly even when my lucene index doesn't have y