Hi Alan, thank you, a jira ticket is opened.
Cheers, Eva On 18.04.2016 19:01, Alan Woodward wrote: > Hi Eva, > > This looks like a bug in WeightedSpanTermExtractor, which is rewriting your > PhraseQuery into a SpanNearQuery without checking how many terms there are. > Could you open a JIRA ticket? > > Alan Woodward > www.flax.co.uk > > >> On 18 Apr 2016, at 16:27, Eva Popenda <eva.pope...@abas.de> wrote: >> >> Hi, >> >> I have a problem when using the Highlighter with N-GramAnalyzer and >> PhraseQuery: >> Searching for a substring with length = N (4 in my case) yields the >> following exception: >> >> java.lang.IllegalArgumentException: Less than 2 subSpans.size():1 >> at >> org.apache.lucene.search.spans.ConjunctionSpans.<init>(ConjunctionSpans.java:40) >> at >> org.apache.lucene.search.spans.NearSpansOrdered.<init>(NearSpansOrdered.java:56) >> at >> org.apache.lucene.search.spans.SpanNearQuery$SpanNearWeight.getSpans(SpanNearQuery.java:232) >> at >> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:292) >> at >> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:137) >> at >> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:506) >> at >> org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:219) >> at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:187) >> at >> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:196) >> >> Below is a JUnit-Test reproducing this behavior. In case of searching for a >> string with more than N characters or using NGramPhraseQuery this problem >> doesn't occur. >> Why is it that more than 1 subSpans are required? >> >> public class HighlighterTest { >> >> @Rule >> public final ExpectedException exception = ExpectedException.none(); >> >> @Test >> public void testHighlighterWithPhraseQueryThrowsException() throws >> IOException, InvalidTokenOffsetsException { >> >> final Analyzer analyzer = new NGramAnalyzer(4); >> final String fieldName = "substring"; >> >> final List<BytesRef> list = new ArrayList<>(); >> list.add(new BytesRef("uchu")); >> final PhraseQuery query = new PhraseQuery(fieldName, >> list.toArray(new BytesRef[list.size()])); >> >> final QueryScorer fragmentScorer = new QueryScorer(query, fieldName); >> final SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("<b>", >> "</b>"); >> >> exception.expect(IllegalArgumentException.class); >> exception.expectMessage("Less than 2 subSpans.size():1"); >> >> final Highlighter highlighter = new >> Highlighter(formatter,TextEncoder.NONE.getEncoder(), fragmentScorer); >> highlighter.setTextFragmenter(new SimpleFragmenter(100)); >> final String fragment = highlighter.getBestFragment(analyzer, >> fieldName, "Buchung"); >> >> assertEquals("B<b>uchu</b>ng",fragment); >> >> } >> >> public final class NGramAnalyzer extends Analyzer { >> >> private final int minNGram; >> >> public NGramAnalyzer(final int minNGram) { >> super(); >> this.minNGram = minNGram; >> } >> >> @Override >> protected TokenStreamComponents createComponents(final String fieldName) >> { >> final Tokenizer source = new NGramTokenizer(minNGram, minNGram); >> return new TokenStreamComponents(source); >> } >> >> } >> >> } >> >> Thanks and cheers, >> Eva >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > -- Eva Popenda | Software-Entwicklerin | Technische Entwicklung abas Software AG | Gartenstraße 67 | 76135 Karlsruhe | Germany Web: http://www.abas-software.com | http://www.abas.de Board of Directors / Vorstand: Michael Baier, Jürgen Nöding, Mario Raatz, Werner Strub Chairman Board of Directors / Vorstandsvorsitzender: Werner Strub Chairman Supervisory Board / Aufsichtsratsvorsitzender: Udo Stößer Registered Office / Sitz der Gesellschaft: Karlsruhe Commercial Register / Handelsregister: HRB 107644 Amtsgericht Mannheim --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org