Re: Obtaining the contexts of hits

Otis Gospodnetic Wed, 09 Mar 2005 13:42:24 -0800

Since you own Lucene in Action, look at this:
http://www.lucenebook.com/search?query=highlighter+context


See section 8.7, Highlighting query terms.

Otis


--- Andy Roberts <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I've been using Lucene for a few months now, although not in a
> typical 
> "building a search engine" kind of way*. Basically, I have some large
> 
> documents. I would like a system whereby I search for a term, and
> then I 
> receive a hit for each match, with its context, e.g., ten words
> either side 
> of the match.
> 
> I've been looking through the API, and there's a SpanNearQuery (or
> something 
> similar) which looked at first glance to be what I want, but after
> second 
> thoughts, it appears it's searching for a set of words within a given
> 
> proximity. 
> 
> I know Lucene stores TermPositions: I know how to get the position of
> a match. 
> Is there a way of doing a reverse lookup, i.e., given a position,
> return the 
> term. Because if that were the case, I could easily build a context
> up by 
> finding pos, then looping to get all terms +- a specified window.
> 
> Either way, I can't see a way forward. Simply being able to find
> which 
> documents the terms are in isn't very helpful, because let's say I
> find the 
> docs that match, I then have to open up each one, tokenise and
> re-search for 
> my term,etc, etc. All this info is there, somewhere in the index, I
> just want 
> to get to it so that I can benefit from the many speed benefits of
> Lucene.
> 
> I don't expect sample code or anything, just a pointer to the right
> direction. 
> I do own the LIA book, but haven't read it all yet - so if there's
> anything 
> in there which could be relevant, please let me know :)
> 
> Much help appreciated,
> Andrew Roberts
> 
> * for those who are interested, I'm a computer scientist doing
> research which 
> is basically a cross between computational linguistics and machine
> learning. 
> I work with large text corpora, gather information about how words
> behave 
> relative to each and try to infer word class and grammatical
> structure from 
> it. I'm using Lucene to try and speed up text processing overheads,
> by having 
> my corpora indexed.
> 
> -- 
> Computer Vision and Language Reseearch Group
> School of Computing
> Univeristy of Leeds
> Leeds, UK
> LS2 9JT
> http://www.comp.leeds.ac.uk/andyr
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Obtaining the contexts of hits

Reply via email to