Hi Eva, This looks like a bug in WeightedSpanTermExtractor, which is rewriting your PhraseQuery into a SpanNearQuery without checking how many terms there are. Could you open a JIRA ticket?
Alan Woodward www.flax.co.uk > On 18 Apr 2016, at 16:27, Eva Popenda <eva.pope...@abas.de> wrote: > > Hi, > > I have a problem when using the Highlighter with N-GramAnalyzer and > PhraseQuery: > Searching for a substring with length = N (4 in my case) yields the following > exception: > > java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 > at > org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:40) > at > org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:56) > at > org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) > at > org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) > at > org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) > at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) > at > org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) > > Below is a JUnit-Test reproducing this behavior. In case of searching for a > string with more than N characters or using NGramPhraseQuery this problem > doesn't occur. > Why is it that more than 1 subSpans are required? > > public class HighlighterTest { > > @Rule > public final ExpectedException exception = ExpectedException.none(); > > @Test > public void testHighlighterWithPhraseQueryThrowsException() throws > IOException, InvalidTokenOffsetsException { > > final Analyzer analyzer = new NGramAnalyzer(4); > final String fieldName = "substring"; > > final List<BytesRef> list = new ArrayList<>(); > list.add(new BytesRef("uchu")); > final PhraseQuery query = new PhraseQuery(fieldName, list.toArray(new > BytesRef[list.size()])); > > final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); > final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b>", > "</b>"); > > exception.expect(IllegalArgumentException.class); > exception.expectMessage("Less than 2 subSpans.size():1"); > > final Highlighter highlighter = new > Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); > highlighter.setTextFragmenter(new SimpleFragmenter(100)); > final String fragment = highlighter.getBestFragment(analyzer, > fieldName, "Buchung"); > > assertEquals("B<b>uchu</b>ng",fragment); > > } > > public final class NGramAnalyzer extends Analyzer { > > private final int minNGram; > > public NGramAnalyzer(final int minNGram) { > super(); > this.minNGram = minNGram; > } > > @Override > protected TokenStreamComponents createComponents(final String fieldName) { > final Tokenizer source = new NGramTokenizer(minNGram, minNGram); > return new TokenStreamComponents(source); > } > > } > > } > > Thanks and cheers, > Eva > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >