[
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978002#comment-13978002
]
Elran Dvir commented on SOLR-2894:
----------------------------------
We have encountered a Java heap memory problem using distributed pivots –
perhaps someone can shed some light on it.
The scenario is as follows:
We run Solr 4.4 (with patch for SOLR-2894) with 20 cores and the maximum java
heap size is 1.5 GB.
The following query with distributed facet pivot generates an out of memory
exception:
rows=0&
q=*:*&
facet=true&
facet.pivot=f1,f2,f3,f4,f5&
f.f1.facet.sort=count&
f.f1.facet.limit=10&
f.f1.facet.missing=true&
f.f1.facet.mincount=1&
f.f2.facet.sort=index&
f.f2.facet.limit=-1&
f.f2.facet.missing=true&
f.f2.facet.mincount=1&
f.f3.facet.sort=index&
f.f3.facet.limit=-1&
f.f3.facet.missing=true&
f.f3.facet.mincount=1&
f.f4.facet.sort=index&
f.f4.facet.limit=-1&
f.f4.facet.missing=true&
f.f4.facet.mincount=1&
f.f5.facet.sort=index&
f.f5.facet.limit=-1&
f.f5.facet.missing=true&
f.f5.facet.mincount=1&
shards=127.0.0.1:8983/solr/shard1,127.0.0.1:8983/solr/shard2
Number of docs in each shard:
shard1: 16,234
shard2: 169,089
These are the fields terms' distribution:
f1: shard1 - 16,046, shard2 - 38
f2: all shards - 232
f3: all shards - 53
f4: all shards - 6
f5: all shards - 10
When we use a maximum java heap size of 8GB, the query finishes. It seems about
of 6GB is used for pivoting.
It doesn’t seem reasonable that the facet.pivot on 2 cores with 200,000 docs
requires that much memory.
We tried looking into the code a little and it seems the sharded queries run
with facet.pivot.mincount=-1 as part of the refinement process.
We also noticed that in this scenario, the parameter skipRefinementAtThisLevel
in the method queuePivotRefinementRequests in the class PivotFacetField is
false.
We think all of this is the cause of the memory consumption – but we couldn't
pinpoint the underlying issue.
Is there a way to alter the algorithm to consume less memory?
If anyone can explain offline the way refinement works here, we would be happy
to try and help resolve this.
Thank you very much.
> Implement distributed pivot faceting
> ------------------------------------
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
> Issue Type: Improvement
> Reporter: Erik Hatcher
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports
> undistributed mode. Distributed pivot faceting needs to be implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]