"Samuru Jackson" <[EMAIL PROTECTED]> wrote on 27/02/2006 01:50:11 PM: > Is there a way to retrieve a List of the matching words for a Hit? > For example I create a query like this one: > "Paris London -Stockholm" > ... > How do I know which words have been found in a document? In one it could be > Paris, in another it could be London or both! > I would need this information in order to highlight those words if I display > the search results to the user.
For the purpose of highlighting, you don't necessarily need to know in advance which word matched: you can just highlight any occurance of either Paris or London - wherever you find them - in the original text. You might want to take a look at the Highlighter class in the contrib directory of Lucene's distribution, which might do what you want. Here is some example code: it creates a Highlighter object for highlighting the given query "q", and then for each of the results, it retrieves the full content of the document from the stored "storeadContent" field which I added to the index, and finds the 2 most relevant sentences in the content and highlights q's words (this is similar to the summaries you see in Google and its likes): Highlighter highlighter = new Highlighter(new QueryScorer(q)); highlighter.setMaxDocBytesToAnalyze(ArbitraryLimits.DocumentToSaveCutOff); for(... i iterates over the relevant hits...){ Document doc = hits.doc(i); TokenStream tokenStream = analyzer.tokenStream("storedContent", new StringReader(doc.get("storedContent"))); summary = highlighter.getBestFragments(tokenStream, doc.get("storedContent"), 2, " ... "); } -- Nadav Har'El --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]