Hi Sabeer, I used Lucene 3.3.0 for testing your code. (I doubt that Lucene 4.0 has been released as version 3.3.0 was released recently in July).
In the second case, due to exact-matching there is no output i.e. there is no "transport" (no exact match) , but "transportation" in sourceText. One could try modifying the query to "transport*" like I did, but I got some error like this : * MemoryIndex class-not-found error (Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/lucene/index/memory/MemoryIndex)* Also, regarding highlighting and regular expression, I found this bug (i'm not sure if this exactly relates to the problem you've asked) http://exist.2174344.n4.nabble.com/exist-Bugs-3038780-match-highlighting-for-lucene-wildcard-and-regex-search-td2317647.html Pretty much helpless after this :( Govind On Mon, Jul 18, 2011 at 4:50 PM, Sabeer Hussain <shuss...@del.aithent.com>wrote: > I am using Lucene 4.0 and trying to use its highlighting feature. I am not > getting the desired result due to some mistake that I am not able to > identify. My source code looks like > > String sourceText = "liver disease kidney transplant"; > String termString ="\"transplant\""; > > SimpleAnalyzer simpleAnalyzer = new SimpleAnalyzer(Version.LUCENE_40); > Query query = new QueryParser(Version.LUCENE_40,"contents", > simpleAnalyzer).parse(termString); > > TokenStream tokenStream = simpleAnalyzer.tokenStream("contents", new > StringReader(sourceText)); > QueryScorer scorer = new QueryScorer(query,"contents"); > scorer.setExpandMultiTermQuery(true); > Fragmenter fragmenter = new SimpleSpanFragmenter(scorer); > > SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter( "*", > "*") ; > Highlighter highlighter = new Highlighter(simpleHTMLFormatter, scorer ); > highlighter.setTextFragmenter(fragmenter); > highlighter.setMaxDocCharsToAnalyze(10000); > String resultString = > highlighter.getBestFragments(tokenStream,sourceText,1000, "..."); > System.out.println("Source Text1 = "+sourceText); > System.out.println("Result Text1 = "+resultString); > > sourceText = "for liver transplantation."; > tokenStream = simpleAnalyzer.tokenStream("contents", new > StringReader(sourceText)); > resultString = highlighter.getBestFragments(tokenStream,sourceText,1000, > "..."); > > System.out.println("Source Text2 = "+sourceText); > System.out.println("Result Text2 = "+resultString); > > For the first text, I am getting the result properly but not for the second > one > > Source Text1 = liver disease kidney transplant > Result Text1 = liver disease kidney *transplant* > > Source Text2 = for liver transplantation. > Result Text2 = > > I am expecting the result for second one like > for liver *transplant*ation > > or > for liver *transplantation* > > What is wrong in my code? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/highlighting-tp542569p3178841.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- No trees were harmed in the creation of this message, but several thousand electrons were mildly inconvenienced.