Well, anything's possible <G>. There's nothing magic about Lucene and its interaction with, say, a PDF document. What you put into the index is all you can get out. So..
You could index the PDF document by pages. That is, each page is a lucene "document", related by some ID (NOT the lucene doc_id, since that can change). You could index the document and give the first term of each page a large positionincrementgap and reconstruct the page data. You could index meta-data in a field of the document giving the term offsets of each page start and reconstruct which page it came from. You could insert a special token at the beginning of each page. You'd have to count to get the page. and on and on. The take-away here is that Lucene is a search *engine*, not a package. You have to carefully construct your application around Lucene to get this kind of meta-data out of it... That said, there might already be a contribution and/or package out there that does much of this for you, but I'm unaware of any... Hope this helps at least a little Erick On 10/16/06, Christoph Pächter <[EMAIL PROTECTED]> wrote:
Hi, I know that I can index pdf-files (using a third-party library). Is it possible to search the index for a phrase, getting not only the document, but also the page number in the (pdf-)document? Or is it even possible to get a bookmark, leading to this page? I am thankful for any information you can provide me, either how to do this indicing and searching, or where I can find further information or example code. Kind regards Christoph --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]