[
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979898#comment-13979898
]
Brett Lucey commented on SOLR-2894:
-----------------------------------
Andrew actually raised that question to me yesterday as well and I spent a
little bit of time looking into it. For the initial request to a shard, we
only lower the mincount if the facet limit is set to something other than -1.
In your case, this would be 10 for the top level pivot. We know we will (at
most) get back 15 terms from each shard in this case. Because we are only
faceting on a limited number of terms, having a mincount of 0 here provides us
the benefit of potentially avoiding refinement. In refinement requests, we
still need to know when a shard has responded to us with it's count for a term,
so the mincount is -1 in that case because we are interested in the term even
if the count is zero. It allows us to mark the shard as having responded and
continue on. It's possible that we might be able to change this, but at the
point of refinement, it's a rather targeted request so I don't expect there to
be a significant benefit to doing so. In your case, with the facet limit being
-1 on f2-f5, no refinement would be performed anyway.
When we designed this implementation, the most important factor for us was
speed, and we were willing to get it at a cost of memory. By making these
changes, we reduced queries which previously took around 70 seconds for us down
to around 600 milliseconds. I suspect that the biggest factor in the poor
memory utilization is the wide open nature of using a facet.limit of -1,
especially on a pivot so deep. Keep in mind that for each level of depth you
add to a pivot, memory and time required will grow exponentially.
Don't forget that if you are querying a node and all of the shards are located
within the same Java VM, you are incurring the memory cost of both shards plus
the node responding to the user query all within the same heap.
I took a quick look at the code today while waiting for some other processes to
finish, and I don't see any obvious low hanging fruit to free up a small amount
of memory.
> Implement distributed pivot faceting
> ------------------------------------
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
> Issue Type: Improvement
> Reporter: Erik Hatcher
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports
> undistributed mode. Distributed pivot faceting needs to be implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]