[jira] [Commented] (SOLR-17123) Highlighting error on repeated word in query

David Smiley (Jira) Fri, 26 Jan 2024 23:00:58 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811475#comment-17811475
 ]


David Smiley commented on SOLR-17123:
-------------------------------------

I was able to reproduce this in TestUnifiedSolrHighlighter with the following 
code snippet.  Nevermind the final XPath based assertion -- I just needed 
something to exhibit the behavior and a convenient means of using a debugger).

{code:java}
  @Test
  public void testRepeatedWord() {
    assertU(adoc("id", "1", "text", "this is a test with zero eight eight"));
    assertU(commit());
    assertQ(
        req(
            "q",
            "\"zero eight eight\"~5",
            "df", "text",
            "hl", "true",
            "hl.fl", "text",
            HighlightParams.FRAGSIZE,
            "0",
            HighlightParams.FIELD_MATCH,
            "true",
            HighlightParams.HIGHLIGHT_MULTI_TERM,
            "false"),
        "//lst[@name='1']/arr[@name='text']/str[.='bogus TODO']");
  }
{code}

I found that if you can set hl.highlightMultiTerm=true, then the problem does 
not appear because a rather different internal mechanism can be used called the 
"weight matches" API (In Lucene Core).  That modern API however doesn't support 
disabling highlighting of multiTerm queries.  So for your scenario, the 
UnifiedHighlighter will use the older "PhraseHelper" which in turn consults the 
SpanQuery to collect the spans via Spans.collect.   All fine... but I see what 
looks like a bug because the NearSpansUnordered query, when it visits this doc 
has a Spans from the scorer with 3 child spans (good) however the latter 2 of 
them are identical (at the same position) instead of one position apart.

> Highlighting error on repeated word in query
> --------------------------------------------
>
>                 Key: SOLR-17123
>                 URL: https://issues.apache.org/jira/browse/SOLR-17123
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 9.4
>         Environment: Solr cluster hosted in Docker with Zookeeper
>            Reporter: Ralph Lacey
>            Priority: Minor
>
> When using a query consisting of a phrase with 2 words, eg "zero eight 
> eight", with the words appearing in the text, and the query using some degree 
> of fuzziness, eg~5, the second of the repeated words does not always get 
> highlighted.
>  
>  * "eight eight" - both get highlighted
>  * "eight eight one" - both get highlighted
>  * "one eight eight" - only the first gets highlighted



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

[jira] [Commented] (SOLR-17123) Highlighting error on repeated word in query

Reply via email to