Since you own Lucene in Action, look at this: http://www.lucenebook.com/search?query=highlighter+context
See section 8.7, Highlighting query terms. Otis --- Andy Roberts <[EMAIL PROTECTED]> wrote: > Hi, > > I've been using Lucene for a few months now, although not in a > typical > "building a search engine" kind of way*. Basically, I have some large > > documents. I would like a system whereby I search for a term, and > then I > receive a hit for each match, with its context, e.g., ten words > either side > of the match. > > I've been looking through the API, and there's a SpanNearQuery (or > something > similar) which looked at first glance to be what I want, but after > second > thoughts, it appears it's searching for a set of words within a given > > proximity. > > I know Lucene stores TermPositions: I know how to get the position of > a match. > Is there a way of doing a reverse lookup, i.e., given a position, > return the > term. Because if that were the case, I could easily build a context > up by > finding pos, then looping to get all terms +- a specified window. > > Either way, I can't see a way forward. Simply being able to find > which > documents the terms are in isn't very helpful, because let's say I > find the > docs that match, I then have to open up each one, tokenise and > re-search for > my term,etc, etc. All this info is there, somewhere in the index, I > just want > to get to it so that I can benefit from the many speed benefits of > Lucene. > > I don't expect sample code or anything, just a pointer to the right > direction. > I do own the LIA book, but haven't read it all yet - so if there's > anything > in there which could be relevant, please let me know :) > > Much help appreciated, > Andrew Roberts > > * for those who are interested, I'm a computer scientist doing > research which > is basically a cross between computational linguistics and machine > learning. > I work with large text corpora, gather information about how words > behave > relative to each and try to infer word class and grammatical > structure from > it. I'm using Lucene to try and speed up text processing overheads, > by having > my corpora indexed. > > -- > Computer Vision and Language Reseearch Group > School of Computing > Univeristy of Leeds > Leeds, UK > LS2 9JT > http://www.comp.leeds.ac.uk/andyr > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]