Hi, I've been using Lucene for a few months now, although not in a typical "building a search engine" kind of way*. Basically, I have some large documents. I would like a system whereby I search for a term, and then I receive a hit for each match, with its context, e.g., ten words either side of the match.
I've been looking through the API, and there's a SpanNearQuery (or something similar) which looked at first glance to be what I want, but after second thoughts, it appears it's searching for a set of words within a given proximity. I know Lucene stores TermPositions: I know how to get the position of a match. Is there a way of doing a reverse lookup, i.e., given a position, return the term. Because if that were the case, I could easily build a context up by finding pos, then looping to get all terms +- a specified window. Either way, I can't see a way forward. Simply being able to find which documents the terms are in isn't very helpful, because let's say I find the docs that match, I then have to open up each one, tokenise and re-search for my term,etc, etc. All this info is there, somewhere in the index, I just want to get to it so that I can benefit from the many speed benefits of Lucene. I don't expect sample code or anything, just a pointer to the right direction. I do own the LIA book, but haven't read it all yet - so if there's anything in there which could be relevant, please let me know :) Much help appreciated, Andrew Roberts * for those who are interested, I'm a computer scientist doing research which is basically a cross between computational linguistics and machine learning. I work with large text corpora, gather information about how words behave relative to each and try to infer word class and grammatical structure from it. I'm using Lucene to try and speed up text processing overheads, by having my corpora indexed. -- Computer Vision and Language Reseearch Group School of Computing Univeristy of Leeds Leeds, UK LS2 9JT http://www.comp.leeds.ac.uk/andyr --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]