[
https://issues.apache.org/jira/browse/LUCENE-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554735#comment-14554735
]
David Smiley commented on LUCENE-6494:
--------------------------------------
bq. We could add a Collection<Term> to MatchData as well, to collect all terms
from a Spans. I'm not sure I see why you need the Term for highlighting
though - can't you just use offsets?
You may be right about not needing the Term. I should retract my concerns
about this for now, as it pertains to accurate highlights. I need to build a
POC to understand what's really needed. Once I saw the SpanCollector, it
seemed very promising but I'm having second thoughts now. When I last thought
about this problem, I ended up wanting a Spans.getChildren() of sorts -- just
like Scorers do. I still think that would most likely be more elegant. The
tricky part of doing such a thing, I think, would be handling the buffered case
of NearSpansOrdered such that if I get the child spans, then it would return
cached child spans for where it matched, not where the current state of the
child spans may have advanced to. Alternatively SpanCollector is somewhat
similar but it's MatchData, as written, doesn't capture each leaf state
separately -- instead it expands the bounds. This means currently I can't get
the offsets of each underlying SpanTermQuery offset match, but only the
aggregate start/end offset span which could cover a ton of text -- I don't want
to highlight everything in-between.
> Make PayloadSpanUtil apply to other postings information
> --------------------------------------------------------
>
> Key: LUCENE-6494
> URL: https://issues.apache.org/jira/browse/LUCENE-6494
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Alan Woodward
> Assignee: Alan Woodward
> Fix For: 5.2
>
> Attachments: LUCENE-6494.patch, LUCENE-6494.patch, LUCENE-6494.patch,
> LUCENE-6494.patch
>
>
> With the addition of SpanCollectors, we can now get arbitrary postings
> information from SpanQueries. PayloadSpanUtil does some rewriting to convert
> non-span queries into SpanQueries so that it can collect payloads. It would
> be good to make this more generic, so that we can collect any postings
> information from any query (without having to make invasive changes to
> already optimized Scorers, etc).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]