Re: Searching pdf, getting page number

2006-10-16 Thread Steven Rowe
Hi Bill, Bill Taylor wrote: > On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: >> I know that I can index pdf-files (using a third-party library). > > Could you please tell me where to find this library? There are several PDF extraction packages listed here (look under the "Lucene Document

Re: Searching pdf, getting page number

2006-10-16 Thread Bill Taylor
On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote: Hi, I know that I can index pdf-files (using a third-party library). Could you please tell me where to find this library? Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pd

Re: Searching pdf, getting page number

2006-10-16 Thread Erick Erickson
Well, anything's possible . There's nothing magic about Lucene and its interaction with, say, a PDF document. What you put into the index is all you can get out. So.. You could index the PDF document by pages. That is, each page is a lucene "document", related by some ID (NOT the lucene doc_id,