On Jul 8, 2005, at 2:57 AM, Daniel Moldovan wrote:
My application must index a lot of books that are stored in xml files.

Each xml file represents a page of the book and this way each page becomes a
lucene Document.

Each page is organized in different sections and finally each section
contains lines.



What I need to do is give the user the possibility to search for a phrase
that starts at the
and of a page and continues on the next page. The span should have some
limits, let's say,  6 words on each page.

Does any one experienced this kind of search? Please share you knowledge if
you did.

You're lucky you get to represent your data so hierarchically! Try getting scholars to represent a book in such a fashion!!! (I'm dealing with scholarly works in XML format and sections do not fall _within_ pages, they can span across pages).

In this case, one field of your document should probably index a page + 6 words on either side of it from the previous and next pages. Maybe you also have a field that represents only the page as well. Perhaps something at query time decides which field to search? Maybe all phrase queries use the overlapped field and other query types use the single page field?

    Erik



Reply via email to