Hi experts, I'm trying to reproduce a bug from Lucene side, and found something.
In latest codeline, 5.2.1, I modified test case HighlighterTest.testSimpleQueryTermScorerHighlighter a little to below, mainly to use SimpleSpanFragmenter to get only one fragment with length 64. public void testSimpleQueryTermScorerHighlighter() throws Exception { doSearching(new SpanTermQuery(new Term(FIELD_NAME, "cats"))); QueryScorer queryScorer = new QueryScorer(query, FIELD_NAME); Highlighter highlighter = new Highlighter(queryScorer); // Highlighter highlighter = new Highlighter(new QueryTermScorer(query)); highlighter.setTextFragmenter(new SimpleSpanFragmenter(queryScorer, 64)); int maxNumFragmentsRequired = 1; // only need one fragment for (int i = 0; i < hits.totalHits; i++) { final int docId = hits.scoreDocs[i].doc; final Document doc = searcher.doc(docId); String text = doc.get(FIELD_NAME); TokenStream tokenStream = getAnyTokenStream(FIELD_NAME, docId); String result = highlighter.getBestFragments(tokenStream, text, maxNumFragmentsRequired, "..."); if (true) System.out.println("\t" + result); } // Not sure we can assert anything here - just running to check we dont // throw any exceptions } With two documents: 1. "The word content does not contain the stem that we are looking for but the metadata cats does. Do you think fragmenter work well? Do you think fragmenter work well?" 2. "The word content does not contain the stem that we are looking for but the metadata cats does. " Got corresponding fragment: 1. "for but the metadata <B>cats</B> does. Do you think fragmenter work", no problem, it's exact what I expected. 2. "The word content does not contain the stem that we are looking for but the metadata <B>cats</B> does. ", apparently the length is more than 64. That's the problem reported by my colleague. More specific, the problem is caused by below code snippet in SimpleSpanFragmenter.isNewFragment: boolean isNewFrag = offsetAtt.endOffset() >= (fragmentSize * currentNumFrags) && (textSize - offsetAtt.endOffset()) >= (fragmentSize >>> 1); At the end of text, fragmenter can't stop well and following logic also does not do the trim work. Is it possible to handle this corner case in standard highlighter code? Best regards, Duke If not now, when? If not me, who?