[
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-2894:
---------------------------
Attachment: SOLR-2894.patch
Ater working through the fix the the refinement logic in
PivotFacetField.queuePivotRefinementRequests the previously failing seed for
TestCloudPivotFacet started to pass, but some sort=index tests still weren't
working, which lead me to realize 2 things:
* some of my tests were absurd -- i've gotten use to using overrequest=0 as a
way to force refinement, but with facet.sort=index combined with limit (and
offset) ad mincount it ment that it was impossible for the sort=index facet
logic to ever find the results we're looking for. We *have* to allow some
overrequest when mincount>1 or the initial shard requests won't find the values
(that will ultimately have a cumulative mincount high enough) in order to even
try refining them.
* offset wasn't being added to the limit in the per-shard requests, so w/o
overrequest enabled you would never get teh values you needed even in ideal
situations
* the shard query logic in FacetComponent was ignoring overrequest when
sort=index ... this seems broken to me, but from what i can tell, it comes
straight form the existing facet.field logic as well.
I'll open a bug to track the existing broken logic overrequest logic in
facet.field -- even though i hope that once we're done with this issue, it may
be fixed via refactoring and shared code with pivots (i'm not 100% certain: the
FacetComponent diff is the bulk of what i still need to review more closely on
this issue)
There's still a failure in DistributedFacetPivotLargeTest (mismatch comapred to
control) when i tried using mincount=0 that i'm not certain if/how we can
solve...
{code}
// :nocommit: broken honda?
rsp = query( params( "q", "*:*",
"rows", "0",
"facet","true",
"facet.sort","index",
"f.place_s.facet.limit", "20",
"f.place_s.facet.offset", "40",
FacetParams.FACET_PIVOT_MINCOUNT,"0",
"facet.pivot", "place_s,company_t") );
{code}
>From what I can tell, the gist of the issue is that when dealing with
>sub-fields of the pivot, the coordination code doesn't know about some of the
>"0" values if no shard which has the value for the parent field even knows
>about the existence of the term.
The simplest example of this discrepency (compared to single node pivots) is to
consider an index with only 2 docs...
{noformat}
[{"id":1,"top_s":"foo","sub_s":"bar"}
{"id":2,"top_s":"xxx","sub_s":"yyy"}]
{noformat}
If those two docs exist in a single node index, and you pivot on
{{top_s,sub_s}} using mincount=0 you get a response like this...
{noformat}
$ curl -sS
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true'
{
"response":{"numFound":2,"start":0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_pivot":{
"top_s,sub_s":[{
"field":"top_s",
"value":"foo",
"count":1,
"pivot":[{
"field":"sub_s",
"value":"bar",
"count":1},
{
"field":"sub_s",
"value":"yyy",
"count":0}]},
{
"field":"top_s",
"value":"xxx",
"count":1,
"pivot":[{
"field":"sub_s",
"value":"yyy",
"count":1},
{
"field":"sub_s",
"value":"bar",
"count":0}]}]}}}
{noformat}
If however you index each of those docs on a seperate shard, the response comes
back like this...
{noformat}
$ curl -sS
'http://localhost:8881/solr/select?q=*:*&rows=0&facet=true&facet.pivot.mincount=0&facet.pivot=top_s,sub_s&omitHeader=true&wt=json&indent=true&shards=localhost:8881/solr,localhost:8882/solr'
{
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_pivot":{
"top_s,sub_s":[{
"field":"top_s",
"value":"foo",
"count":1,
"pivot":[{
"field":"sub_s",
"value":"bar",
"count":1}]},
{
"field":"top_s",
"value":"xxx",
"count":1,
"pivot":[{
"field":"sub_s",
"value":"yyy",
"count":1}]}]}}}
{noformat}
The only solution i can think of, would be an extra (special to mincount=0)
stage of logic, after each PivotFacetField is refined, that would:
* iterate over all the values of the current pivot
* build up a Set of all all the known values for the child-pivots of of those
values
* iterate over all the values again, merging in a "0"-count child value for
every value in the set
...ie: "At least one shard knows about value 'v_x' in field 'sub_field', so add
a count of '0' for 'v_x' in every 'sub_field' collection nested under the
'top_field' in our 'top_field,sub_field' pivot"
I haven't thought this idea through enough to be confident it would work, or
that it's worth doing ... i'm certainly not convinced that mincount=0 makes
enough sense in a facet.pivot usecase to think getting this test working should
hold up getting this committed -- probably something that should just be
committed as is, with an open Jira that it's a known bug.
{panel:title=Summary Changes in this patch}
* PivotFacet
** add a new REFINE_PARAM constant for "fpt"
* PivotFacetProcessor
** javadocs
** use REFINE_PARAM constant
* PivotFacetField
** processDefiniteCandidateElement
*** javadocs
*** numberOfValuesContributedByShardWasLimitedByFacetFieldLimit can only be
trusted when sort=count
** processPossibleCandidateElement
*** method only useful when sort=count
*** added assert & javadocs making this clear
** queuePivotRefinementRequests
*** call processDefiniteCandidateElement on all elements when using sort=index
* FacetComponent
** applyToShardRequests - removed this method
*** a bunch of it was dead code (if limit > 0, no need to check limit>=0)
*** most of what wasn't dead code was also being done by the callers (ie:
redundent overrequest logic)
*** this was also where the original mincount=0 bug lived (mincount was being
forced to 1 when called from pivot cade)
** modifyRequestForIndividualPivotFacets & modifyRequestForFieldFacets
*** made sure they were directly doing the stuff they use to depend on
applyToShardRequests for
*** fixed up limit+offset & overrequest logic
** use REFINE_PARAM constant
* DistributedFacetPivotLargeTest
** fixed tests to be less overzealous about overrequest=0
** added more mincount=0 testing (currently fails)
{panel}
> Implement distributed pivot faceting
> ------------------------------------
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
> Issue Type: Improvement
> Reporter: Erik Hatcher
> Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch,
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894_cloud_test.patch,
> dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports
> undistributed mode. Distributed pivot faceting needs to be implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]