[
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-2894:
---------------------------
Attachment: SOLR-2894.patch
I'm going to spend some time this week reviewing the state of things.
First up, some minor tweaks to the latest patch...
* fixed a typo in TestDistributedSearch ("facet.fiedl" -> "facet.field")
** this is my biggest anoyance about most of our existing distributed serach
tests -- they just assert that queries return the same thing as single node
tests, but don't assert anything about the response, so mistakes in the input,
or mistakes in indexing hte docs, resulting in a useless test aren't caught)
** this also relates to marks comment about removing "ndate" since that field
no longer exists in the test configs - using tdate_a & tdate_b here should be
fine
* removed the "nocommit" mark mentioned regarding DateField - that method moved
to TrieDateField so his fix is correct.
Some comments/questions based on what I've reviewed so far (note: many of these
comments/questions come from a place of genuine ignorance since i've only
reviewed about 30% of the patch so far)...
* even at a glance, it's obvious the SimpleFacets changes are a simple
refactoring and totally fine.
* In FacetComponent - Setting asside the core pivot facet changes...
** Most of the other changes in seem like straight forward (and much needed!)
variable renaming (+1) to help eliminate ambiguity between the existing field
faceting refinement and the new pivot faceting refinement
** the new "fieldsToOverRequestOn" Map confuses me in a few ways...
*** As is, i don't understand why this is a Map and not a Set.
*** Some odd conditional logic is used when iterating over this "Set" to
determine the overrequest limit - i'm still trying to wrap my head arround this
but in particular the comment {{// dff has the info, the params have been
scrubbed}} confuses me -- where are these params "scrubbed" ?
*** I like these new explict overrequest count/ratio params, and i get that the
end-game here is that they can be used to affect the amount of overrequest done
for both fact.field and facet.pivot -- but i'm not understanding the value of
building up this "fieldsToOverRequestOn" set of names (for every shard request)
and then iterating over it and consulting *either* the DFF or the params to
decide which limit value to use on the shard requests, and then
(conditionally?) removing the limit/offset/overrequest params from the shard
requests. Wouldn't it be simplier to have modifyRequest *always* remove the
limit/offset/overrequest params from the shard params, and then have the
individual code paths (for both facet.field & facet.pivot) take responsibility
for adding back the new limit params based on the overrequest calculations
using the *original* request params (ie: {{rb.req.getParams()}}).
*** My chief concern here being that (at first glance) this change seems like
it adds a small amount of overhead to the overrequest limit calculations, and
makes this bit of code more confusing, w/o any obvious (to me) advantage.
* I don't yet understand the need for the new "PURPOSE_REFINE_PIVOT_FACETS"
stage of shard requests? ... can someone clarify why pivot facets can't just
be refined during the existing "PURPOSE_REFINE_FACETS" stage?
* I notice that the new DistributedFacetPivotTest directly extends
BaseDistributedSearchTestCase and uses a fixed shard count, and indexes some
docs directly to certain clients
** is there something about the functionality (or about the test) that requires
certain data locality (ie: certain docs on same shard) to work?
** if not: is there any other reason we can't switch this over to a Cloud based
test with a variable numbers of shards and compairons against the control
collection?
> Implement distributed pivot faceting
> ------------------------------------
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
> Issue Type: Improvement
> Reporter: Erik Hatcher
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports
> undistributed mode. Distributed pivot faceting needs to be implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]