Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
2 Thanks, Ben On Wed, 25 May 2005, Thomas X Hoban wrote: Thanks for replying. When I run the command, it generates a file with a "txt" extension. The text in this file has spaces interspersed in odd spots. Here is output from a file I ran the command on... Marc h 29, 2005 Hello

Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
en indexing your documents? On 5/25/05, Ben Litchfield <[EMAIL PROTECTED]> wrote: Can you run the following command line application on the PDF to verify that the extracted text is correct java org.pdfbox.ExtractText Ben On Wed, 25 May 2005, Thomas X Hoban wrote: > > > First, I

Re: Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
run the following command line application on the PDF to verify that the extracted text is correct java org.pdfbox.ExtractText Ben On Wed, 25 May 2005, Thomas X Hoban wrote: First, I am new to Lucene. Is there anyone out there who has had trouble getting hits when running phrase que

Lucene - PDFBox

2005-05-25 Thread Thomas X Hoban
First, I am new to Lucene. Is there anyone out there who has had trouble getting hits when running phrase queries against an index that contains content from PDF files. For PDF documents, I create the document using LucenePDFDocument.getDocument(file) and then add it to the index. For n