[
https://issues.apache.org/jira/browse/SOLR-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-6581:
---------------------------------
Description:
*Background*
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
are optimized to work with a top level FieldCache. Top level FieldCaches have a
very fast docID to top-level ordinal lookup. Fast access to the top-level
ordinals allows for very high performance field collapsing on high cardinality
fields.
LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
FieldCache is no longer in regular use. Instead all top level caches are
accessed through MultiDocValues.
There are some major advantages of using the MultiDocValues rather then a top
level FieldCache. But there is one disadvantage, the lookup from docId to
top-level ordinals is slower using MultiDocValues.
My testing has shown that *after optimizing* the CollapsingQParserPlugin code
to use MultiDocValues, the performance drop is around 100%. For some use cases
this performance drop is a blocker.
*What About Faceting?*
String faceting also relies on the top level ordinals. Is faceting performance
affected also? My testing has shown that the faceting performance is affected
much less then collapsing.
One possible reason for this may be that field collapsing is memory bound and
faceting is not. So the additional memory accesses needed for MultiDocValues
affects field collapsing much more then faceting.
*Proposed Solution*
The proposed solution is to have the default Collapse and Expand algorithm use
MultiDocValues, but to provide an option to use a top level FieldCache if the
performance of MultiDocValues is a blocker.
The proposed mechanism for switching to the FieldCache would be a new "hint"
parameter. If the hint parameter is set to "FAST_QUERY" then the top-level
FieldCache would be used for both Collapse and Expand.
Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}
was:
*Background*
The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
are optimized to work with a top level FieldCache. Top level FieldCaches have a
very fast docID to top-level ordinal lookup. Fast access to the top-level
ordinals allows for very high performance field collapsing on high cardinality
fields.
LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
FieldCache is no longer in regular use. Instead all top level caches are
accessed through MultiDocValues.
There are some major advantages of using the MultiDocValues rather then a top
level FieldCache. But the lookup from docId to top-level ordinals is slower
using MultiDocValues.
My testing has shown that *after optimizing* the CollapsingQParserPlugin code
to use MultiDocValues, the performance drop is around 100%. For some use cases
this performance drop is a blocker.
*What About Faceting?*
String faceting also relies on the top level ordinals. Is faceting performance
effected also? My testing has shown that the faceting performance is effected
much less then collapsing.
One possible reason for this is that field collapsing is memory bound and
faceting is not. So the additional memory accesses needed for MultiDocValues
effects field collapsing much more the faceting.
*Proposed Solution*
The proposed solution is to have the default Collapse and Expand algorithm us
MultiDocValues, but to provide an option to use a top level FieldCache if the
performance of MultiDocValues is a blocker.
The proposed mechanism for switching to the FieldCache would be a new "hint"
parameter. If the hint parameter is set to "FAST_QUERY" then the top-level
FieldCache would be used for both Collapse and Expand.
Example syntax:
{code}
fq={!collapse field=x hint=FAST_QUERY}
{code}
> Prepare CollapsingQParserPlugin and ExpandComponent for 5.0
> -----------------------------------------------------------
>
> Key: SOLR-6581
> URL: https://issues.apache.org/jira/browse/SOLR-6581
> Project: Solr
> Issue Type: Bug
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Priority: Minor
> Fix For: 5.0
>
> Attachments: SOLR-6581.patch, SOLR-6581.patch
>
>
> *Background*
> The 4x implementation of the CollapsingQParserPlugin and the ExpandComponent
> are optimized to work with a top level FieldCache. Top level FieldCaches have
> a very fast docID to top-level ordinal lookup. Fast access to the top-level
> ordinals allows for very high performance field collapsing on high
> cardinality fields.
> LUCENE-5666 unified the DocValues and FieldCache api's so that the top level
> FieldCache is no longer in regular use. Instead all top level caches are
> accessed through MultiDocValues.
> There are some major advantages of using the MultiDocValues rather then a top
> level FieldCache. But there is one disadvantage, the lookup from docId to
> top-level ordinals is slower using MultiDocValues.
> My testing has shown that *after optimizing* the CollapsingQParserPlugin code
> to use MultiDocValues, the performance drop is around 100%. For some use
> cases this performance drop is a blocker.
> *What About Faceting?*
> String faceting also relies on the top level ordinals. Is faceting
> performance affected also? My testing has shown that the faceting performance
> is affected much less then collapsing.
> One possible reason for this may be that field collapsing is memory bound and
> faceting is not. So the additional memory accesses needed for MultiDocValues
> affects field collapsing much more then faceting.
> *Proposed Solution*
> The proposed solution is to have the default Collapse and Expand algorithm
> use MultiDocValues, but to provide an option to use a top level FieldCache if
> the performance of MultiDocValues is a blocker.
> The proposed mechanism for switching to the FieldCache would be a new "hint"
> parameter. If the hint parameter is set to "FAST_QUERY" then the top-level
> FieldCache would be used for both Collapse and Expand.
> Example syntax:
> {code}
> fq={!collapse field=x hint=FAST_QUERY}
> {code}
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]