[
https://issues.apache.org/jira/browse/LUCENE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564678#comment-14564678
]
Robert Muir commented on LUCENE-6371:
-------------------------------------
{quote}
I think it's still useful though - I use it all the time!
{quote}
Yeah but its slow with no easy chance of ever being faster. There is no simple
bitset rewrite here like there is for other multiterm queries. Additionally It
has all the downsides of an enormous boolean query, but with proximity to boot:
and this is very real, even simple stuff like 1-2 KB RAM consumption per term
due to additional decompression buffers for prox. Maybe in the future you
could optionally index prefix terms, but I can't imagine merging proximity etc
into a prefix-field for full-indexed-fields as a default, seems complicated and
slow and space-consuming.
{quote}
It would be nice if you could restrict the number of SpanOr clauses it rewrites
to, but that's a separate issue.
{quote}
+1, that is a great idea. We should really both do that and also add warnings
to the javadocs about inefficiency. It has none today!
{quote}
If you really think that moving .getSpans() and .extractTerms() to SpanWeight
doesn't gain anything, then I can back it out. But I think it does simplify the
API and brings it more into line with our other standard queries.
{quote}
I totally agree it has the value of consistency with other queries. But some of
the APIs trying to do this are fairly complicated, yet at the same time still
not really working: see below for more explanation.
{quote}
And I really don't see that exposing the termcontexts map on the SpanWeight
constructor is any worse than exposing it directly in .getSpans(). In fact, I'd
say that it's hiding it better - very few users of lucene are going to be
looking at SpanWeights, as they're an implementation detail, but anyone using
an IDE is going to be shown SpanQuery.getSpans() when they try and autocomplete
on a SpanQuery object, and it's not something that most users need to worry
about.
{quote}
Its actually terrible already: the motivation for this stuff being to try to
speedup the turtle in question, SpanMultiTermQuery. The reason this stuff was
exposed, is because it could bring some relief to such crazy queries, by only
visiting each term in the term dictionary less than 3 times (rewrite,
weight/idf, postings). But this was never quite right for two reasons:
* Leniency: We can't enforce we are doing the performant thing because creation
of weight/idf uses extractTerms(). So the SpanTermWeight inside the exclude
portion of a SpanNot suddenly sees an unexpected term it has no termstate for.
Maybe patches here removed this problem, but forgot to fix the leniency in
SpanTermWeight, as I see at least the code comment is gone.
* Incomplete: SpanMultiTermQueryWrapper still isn't reusing the termcontext
from rewrite(), somehow passing it down to the rewritten-spans. So the whole
ugly thing isn't even totally working, its just reducing the number of visits
to the term dictionary from 3 down to 2, but it is stupid that it is not 1.
> Improve Spans payload collection
> --------------------------------
>
> Key: LUCENE-6371
> URL: https://issues.apache.org/jira/browse/LUCENE-6371
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Alan Woodward
> Priority: Minor
> Fix For: Trunk, 5.3
>
> Attachments: LUCENE-6371.patch, LUCENE-6371.patch, LUCENE-6371.patch,
> LUCENE-6371.patch
>
>
> Spin off from LUCENE-6308, see the comments there from around 23 March 2015.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]