Re: Indexing is hung or doesn't complete

2010-10-13 Thread Ching
use > PDFBox to extract text. I prefer poppler tools. > > On Wed, Oct 13, 2010 at 2:22 PM, Ching wrote: > > > Hi, > > > > Thank you for your suggestions. I found the reason which is that PDFBox > > seems having problem parsing large document (20MB), I have a

Re: Indexing is hung or doesn't complete

2010-10-13 Thread Ching
Hi, Thank you for your suggestions. I found the reason which is that PDFBox seems having problem parsing large document (20MB), I have a few of them within those 2000 docs, those are the ones throwing OutOfMemory errors. The app does exit, and JVM died. I am running on 32bit machine. -- Ching

Indexing is hung or doesn't complete

2010-10-12 Thread Ching
Hi All, Can anyone help with this issue? I have about 2000 pdf files that I use PDFBox to extract its text, then index them using for loop. The indexing stopped after the fdt file reaches at 7,061 KB in size. There is no error, the indexing simply stopped. Thanks in advance for any help. Ching

Re: Numeric range query not returning results

2010-10-05 Thread Ching
More problem with NumericRangeQuery when combined it with BooleanQuery. Here are the problem, please help. 1. I have a field of Date that is indexed as long 2. In the search, I need to exclude some time period, I used the BooleanQuery to combined those excluded time periods like below, BooleanQuer

Help wanted with Indexing PDF Documents

2010-03-02 Thread Ching Zheng
Hi, I have about 50 PDF douments with size of each is around 10MB. I am using PDFbox for parsing, just wondering how I can index bookmarsk with its corresponded page information? I use PDDocumentOutline to get bookmark's title, but I only have PDNamedDestination which offers no page number info. C

How to do refined search based on attributes and never return zero results

2005-12-07 Thread Ching-Pei Hsing
at the indexing time? Or, is there any technology we need to integrate, like those for data warehousing? Any comments or pointers will be greatly appreciated. Thanks Ching-pei