Hi Bill,
Bill Taylor wrote:
> On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote:
>> I know that I can index pdf-files (using a third-party library).
>
> Could you please tell me where to find this library?
There are several PDF extraction packages listed here (look under the
"Lucene Document
On Oct 16, 2006, at 5:44 AM, Christoph Pächter wrote:
Hi,
I know that I can index pdf-files (using a third-party library).
Could you please tell me where to find this library?
Is it possible to search the index for a phrase, getting not only the
document, but also the page number in the (pd
Well, anything's possible .
There's nothing magic about Lucene and its interaction with, say, a PDF
document. What you put into the index is all you can get out. So..
You could index the PDF document by pages. That is, each page is a lucene
"document", related by some ID (NOT the lucene doc_id,