Hello, I am trying to find the right approach for finding frequency (and, slightly lower in priority, location) of search hits in a document. I am working through the online documentation and the helpful "Lucene in Action" book. There are several examples and explanations which seem close, but not quite what I am looking for. Can anyone point me in the right direction?
I have a set of queries, say 10000 different 'things' I want to find. They range from single word matches (e.g. Lucene) to prefix queries (e.g. index*) to phrases (e.g. "Lucene in Action", stopword irrelevant, so "Lucene Action"?). I will also be delving into the more advanced topics like proximity, fuzzy, snowball and such. For the moment though, I will stick with the first three I mention, which I believe translate to: TermQuery, PrefixQuery, and PhraseQuery. How do I find how many hits occur in a document? I've seen the faq: Is there a way to retrieve the original term positions during the search? Yes, see the Javadoc for IndexReader.termPositions(). I'm probably missing the obvious here, but I assume this refers to the analyzed terms (i.e. individual words, possibly transmogrified by the analyzer). I further assume that this does not directly relate to the results of a search for "Lucene in Action". Where do I find information about the search hits? Have I skimmed over this part of the API? Thanks in advance, Sean --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]