subject:"Re\: Indexing is hung or doesn't complete"

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Bill Janssen

Ching wrote: > I use PDFBox version 1.1.0; I did find a workaround now. Just wondering > which tools do you use to extract text from pdf? Thanks. Ching, in UpLib I use a patched version of xpdf which reports the bounding box and font information for each word (as well as the Unicode characters o

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Ching

I use PDFBox version 1.1.0; I did find a workaround now. Just wondering which tools do you use to extract text from pdf? Thanks. On Wed, Oct 13, 2010 at 11:36 AM, Fabiano Nunes wrote: > What version of PDFBox are you running? > PDFBox 0.72 does not work properly with some pdf documents. See more

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Fabiano Nunes

What version of PDFBox are you running? PDFBox 0.72 does not work properly with some pdf documents. See more in https://issues.apache.org/jira/browse/PDFBOX-361. So, I wrote a extractor (a copy of the original, in fact) based on trunk version (1.2.1, actually). Furthermore, this version is faster e

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Ching

Hi, Thank you for your suggestions. I found the reason which is that PDFBox seems having problem parsing large document (20MB), I have a few of them within those 2000 docs, those are the ones throwing OutOfMemory errors. The app does exit, and JVM died. I am running on 32bit machine. -- Ching On

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Senthil

Hi Ching I donot think issue with Lucene for 2000 documents. As Anshum mentioned, give more details about environment. And check what CPU usage and index fdt file timestamp while it hangs. And using logs would help to identify real cause. I used to work with Lucene 2.4 and recently 3.0.2. No sim

Re: Indexing is hung or doesn't complete

2010-10-12 Thread Anshum

Hi Ching, Does the app exit or hang and stay there? as in does the JVM stay alive and idle? Also, can you make sure that its not the pdfbox? as in, try commenting the indexwriter part and just read the pdfs, does that work fine. Can you also post info on your environment? Index Size? Lucene Version

Re: Indexing is hung or doesn't complete

Re: Indexing is hung or doesn't complete

Re: Indexing is hung or doesn't complete

Re: Indexing is hung or doesn't complete

Re: Indexing is hung or doesn't complete

Re: Indexing is hung or doesn't complete

6 matches

Site Navigation

Mail list logo

Footer information