[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

Elran Dvir (JIRA) Wed, 23 Apr 2014 02:05:08 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978002#comment-13978002
 ]


Elran Dvir commented on SOLR-2894:
----------------------------------

We have encountered a Java heap memory problem using distributed pivots – 
perhaps someone can shed some light on it.

The scenario is as follows:
We run Solr 4.4 (with patch for SOLR-2894) with 20 cores and the maximum java 
heap size is 1.5 GB.
The following query with distributed facet pivot generates an out of memory 
exception:
rows=0&
q=*:*&
facet=true&
facet.pivot=f1,f2,f3,f4,f5&
f.f1.facet.sort=count&
f.f1.facet.limit=10&
f.f1.facet.missing=true&
f.f1.facet.mincount=1&
f.f2.facet.sort=index&
f.f2.facet.limit=-1&
f.f2.facet.missing=true&
f.f2.facet.mincount=1&
f.f3.facet.sort=index&
f.f3.facet.limit=-1&
f.f3.facet.missing=true&
f.f3.facet.mincount=1&
f.f4.facet.sort=index&
f.f4.facet.limit=-1&
f.f4.facet.missing=true&
f.f4.facet.mincount=1&
f.f5.facet.sort=index&
f.f5.facet.limit=-1&
f.f5.facet.missing=true&
f.f5.facet.mincount=1&
shards=127.0.0.1:8983/solr/shard1,127.0.0.1:8983/solr/shard2

Number of docs in each shard:
shard1: 16,234
shard2: 169,089

These are the fields terms' distribution:
f1: shard1 - 16,046, shard2 - 38
f2: all shards - 232
f3: all shards - 53
f4: all shards - 6
f5: all shards - 10

When we use a maximum java heap size of 8GB, the query finishes. It seems about 
of 6GB is used for pivoting.
It doesn’t seem reasonable that the facet.pivot on 2 cores with 200,000 docs 
requires that much memory.

We tried looking into the code a little and it seems the sharded queries run 
with facet.pivot.mincount=-1 as part of the refinement process.
We also noticed that in this scenario, the parameter skipRefinementAtThisLevel 
in the method queuePivotRefinementRequests in the class PivotFacetField is 
false.
We think all of this is the cause of the memory consumption – but we couldn't 
pinpoint the underlying issue.

Is there a way to alter the algorithm to consume less memory?
If anyone can explain offline the way refinement works here, we would be happy 
to try and help resolve this.

Thank you very much.


> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports 
> undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting

Reply via email to