[
https://issues.apache.org/jira/browse/LUCENE-7757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947451#comment-15947451
]
David Smiley commented on LUCENE-7757:
--------------------------------------
I got to the bottom of this one; it's tricky. I see two issues:
1. The UH's {{PhraseHelper}} uses {{WeightedSpanTermExtractor}} to convert the
query to a {{SpanQuery}}. WSTE has no knowledge of {{ComplexPhraseQuery}} so
it has some fallback logic. {{PhraseHelper}} overrides {{isQueryUnsupported}}
but it has a lingering TODO with a return true, thus any any query not known in
advance is not going to be highlighted. I think this should be modified to
return false. I did that locally and I also found it to then be necessary to
override {{getLeafContext()}} to return a dummy context. The PH can't produce
a real leaf context (here) because this is the stage at which it is merely
analyzing the query, no possible wildcard expansion is done (yet). The query
worked in the original Highlighter because there is no split phase.
2. {{ComplexPhraseQueryParser}} produces a special Query subclass
{{ComplexPhraseQuery}}. CPQ implements rewrite() that also calls rewrite() on
the clauses. It expects a _real_ (not a dummy) leaf context. So this works
from a query execution standpoint, but I think it would be more friendly with
the UH if CPQ didn't cascade the rewrite. It's not a simple matter of
commenting out the cascaded rewrite though... I will investigate further when I
have more time.
> Unified highlighter does not highlight wildcard phrases correctly when
> ComplexPhraseQueryParser is used
> -------------------------------------------------------------------------------------------------------
>
> Key: LUCENE-7757
> URL: https://issues.apache.org/jira/browse/LUCENE-7757
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/highlighter
> Affects Versions: 6.4
> Reporter: Bjarke Mortensen
> Assignee: David Smiley
>
> Given the text:
> "Kontraktsproget vil være dansk og arbejdssproget kan være dansk, svensk,
> norsk og engelsk"
> and the query:
> \{!complexphrase df=content_da\}("sve* no*")
> the unified highlighter (hl.method=unified) does not return any highlights.
> For reference, the original highlighter returns a snippet with the expected
> highlights:
> Kontraktsproget vil være dansk og arbejdssproget kan være dansk,
> <em>svensk</em>, <em>norsk</em> og
> Is this expected behaviour with the unified highlighter?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]