Hi, I have a problem when using the Highlighter with N-GramAnalyzer and PhraseQuery: Searching for a substring with length = N (4 in my case) yields the following exception:
java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 at org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:40) at org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:56) at org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) Below is a JUnit-Test reproducing this behavior. In case of searching for a string with more than N characters or using NGramPhraseQuery this problem doesn't occur. Why is it that more than 1 subSpans are required? public class HighlighterTest { @Rule public final ExpectedException exception = ExpectedException.none(); @Test public void testHighlighterWithPhraseQueryThrowsException() throws IOException, InvalidTokenOffsetsException { final Analyzer analyzer = new NGramAnalyzer(4); final String fieldName = "substring"; final List<BytesRef> list = new ArrayList<>(); list.add(new BytesRef("uchu")); final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new BytesRef[list.size()])); final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b>", "</b>"); exception.expect(IllegalArgumentException.class); exception.expectMessage("Less than 2 subSpans.size():1"); final Highlighter highlighter = new Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); highlighter.setTextFragmenter(new SimpleFragmenter(100)); final String fragment = highlighter.getBestFragment(analyzer, fieldName, "Buchung"); assertEquals("B<b>uchu</b>ng",fragment); } public final class NGramAnalyzer extends Analyzer { private final int minNGram; public NGramAnalyzer(final int minNGram) { super(); this.minNGram = minNGram; } @Override protected TokenStreamComponents createComponents(final String fieldName) { final Tokenizer source = new NGramTokenizer(minNGram, minNGram); return new TokenStreamComponents(source); } } } Thanks and cheers, Eva --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org