Greetings! Any body has input on this?
Best regards, Duke If not now, when? If not me, who? On Fri, Aug 7, 2015 at 10:58 AM, Duke DAI <duke.dai....@gmail.com> wrote: > Hi experts, > > I'm trying to reproduce a bug from Lucene side, and found something. > > In latest codeline, 5.2.1, I modified test > case HighlighterTest.testSimpleQueryTermScorerHighlighter a little to > below, mainly to use SimpleSpanFragmenter to get only one fragment with > length 64. > > public void testSimpleQueryTermScorerHighlighter() throws Exception { > doSearching(new SpanTermQuery(new Term(FIELD_NAME, "cats"))); > QueryScorer queryScorer = new QueryScorer(query, FIELD_NAME); > Highlighter highlighter = new Highlighter(queryScorer); > // Highlighter highlighter = new Highlighter(new > QueryTermScorer(query)); > highlighter.setTextFragmenter(new SimpleSpanFragmenter(queryScorer, > 64)); > int maxNumFragmentsRequired = 1; // only need one fragment > for (int i = 0; i < hits.totalHits; i++) { > final int docId = hits.scoreDocs[i].doc; > final Document doc = searcher.doc(docId); > String text = doc.get(FIELD_NAME); > TokenStream tokenStream = getAnyTokenStream(FIELD_NAME, docId); > > String result = highlighter.getBestFragments(tokenStream, text, > maxNumFragmentsRequired, > "..."); > if (true) System.out.println("\t" + result); > } > // Not sure we can assert anything here - just running to check we dont > // throw any exceptions > } > > With two documents: > 1. "The word content does not contain the stem that we are looking for but > the metadata cats does. Do you think fragmenter work well? Do you think > fragmenter work well?" > 2. "The word content does not contain the stem that we are looking for but > the metadata cats does. " > Got corresponding fragment: > 1. "for but the metadata <B>cats</B> does. Do you think fragmenter work", > no problem, it's exact what I expected. > 2. "The word content does not contain the stem that we are looking for but > the metadata <B>cats</B> does. ", apparently the length is more than 64. > That's the problem reported by my colleague. > > More specific, the problem is caused by below code snippet in > SimpleSpanFragmenter.isNewFragment: > > boolean isNewFrag = offsetAtt.endOffset() >= (fragmentSize * > currentNumFrags) > && (textSize - offsetAtt.endOffset()) >= (fragmentSize >>> 1); > > At the end of text, fragmenter can't stop well and following logic also > does not do the trim work. > > > Is it possible to handle this corner case in standard highlighter code? > > > > Best regards, > Duke > If not now, when? If not me, who? >