[
https://issues.apache.org/jira/browse/LUCENE-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
David Smiley resolved LUCENE-8848.
----------------------------------
Resolution: Fixed
Assignee: David Smiley
Fix Version/s: 8.2
Debatable, but I filed this as an improvement to help highlight queries better.
> UnifiedHighlighter should highlight all Query types that implement
> Weight.matches
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-8848
> URL: https://issues.apache.org/jira/browse/LUCENE-8848
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Reporter: David Smiley
> Assignee: David Smiley
> Priority: Major
> Fix For: 8.2
>
> Attachments: LUCENE-8848.patch
>
>
> The UnifiedHighlighter internally extracts terms and automata from the query.
> Usually this works perfectly but it's possible a Query might be of a type it
> doesn't know -- a leaf query that is perhaps in effect similar to a
> MultiTermQuery yet it might not even be a subclass of this or it does but the
> UH doesn't know how to extract an automata from it. The UH is oblivious to
> this and probably won't highlight this query. If re-analysis of the text is
> necessary, the UH will pre-filter all terms to only those it _thinks_ are
> pertinent. Or if offsets are in the postings then the UH could perform very
> poorly by unleashing this query on the index for each highlighted document
> without recognizing re-analysis is a more appropriate path.
> I think to solve this, the UnifiedHighlighter.getFieldHighlighter needs to
> inspect the query (using a QueryVisitor) to see if it can find a leaf query
> that is not one it knows how to pull automata from, and is otherwise not in a
> special list (like MatchAllDocsQuery). If we find one, we avoid choosing
> OffsetSource.POSTINGS or OffsetSource.NONE_NEEDED since we might in effect
> have an MTQ like query. If a MemoryIndex is needed then we don't pre-filter
> the terms since we can't assume we know precisely which terms are pertinent.
> We needn't bother extracting terms & automata in this case either; it's
> wasted effort which can involve building a CharacterRunAutomaton (see
> MultiTermHighlighting.binaryToCharRunAutomaton). Speaking of which, it'd be
> nice to avoid that in other cases as well, like for WEIGHT_MATCHES when we
> aren't using MemoryIndex (thus no term pre-filtering).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]