[
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-2894:
---------------------------
Attachment: SOLR-2894.patch
I've been focusing on more tests using facet.offset...
bq. I haven't looed into this closely, but i noticed the refinement code seems
to only refine things started at the "facetFieldOffset," of the current
collection don't we need to refine all the values, starting from the beginging
of the list?
There was in fact a bug with refinement when using facet.offset -- but i was
looking in the wrong place. the code i was refering to before was involved in
deciding which values to drilldown into when recursively refining the
sub-pivots. that logic was already (mostly) correct because by that point
we've already refined the _current_ levle completly, so we can skip past the
offset when doing the recursion (the only glitch was a boundary check causing
an IOOBE, see detials below). Earlier on in the code however, there was a
mistake where only the limit (not the limit+offset) was being used to decide
the threshold value for refinement.
----
New improvements in this patch...
* TestCloudPivotFacet
** increase the odds of overrequest==0
** randonly include a facet.offset param to sanity check refinement in that case
* PivotFacetField
** fix refineNextLevelOfFacets not to ask for a sublist with a start offset
bigger then the size of the collection
*** this was causing an IndexOutOfBoundsException pretty quickly when offset
was mixed into the random test
** fix queuePivotRefinementRequests to respect offset when picking the
"indexOfCountThreshold"
*** before it was only looking at limit, with offset in the randomized test
this was causing failures even when pivots only had one field in them!
----
A few more things to consider in the future...
* PivotFacetFieldValueCollection.refinableSubList is only use to deal with
offset+limit sublisting from PivotFacetField.refineNextLevelOfFacets -- but
PivotFacetFieldValueCollection already knows the offset&limit so maybe it
should be a smarter special purpose method with 0 args:
{{getNextLevelValuesToRefine()}}
* trim earlier?
** the way refinement currently works in PivotFacetField, after we've refined
our values, we mark that we no longer need refinement, and then on the next
call we recursively refine the subpivots of each value -- and in both cases we
do the offset+limit calculations and hang on to all of the values (both below
offset and above limit) as we keep iterating down hte pivots -- they don't get
thrown away until the final trim() call just before building up the final
result.
** i previously suggested folding the trim() logic into the NamedList response
logic -- but now i'm wondering if the trim() logic should instead be folded
into refinement? so once we're sure a level is fully refined, we go ahead and
trim that level before drilling down and refining it's kids?
----
Unfortunately, with this new patch, i did uncover a new random failure i can't
easily explain (doesn't seem related ot the offset changes since facet.offset
isn't evne used in these random params -- but it's possible i broke something
while fixing that) ...
{noformat}
[junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestCloudPivotFacet
-Dtests.method=testDistribSearch -Dtests.seed=775F7BCA685BBC22
-Dtests.nightly=true -Dtests.slow=true -Dtests.locale=da_DK
-Dtests.timezone=America/Montserrat -Dtests.file.encoding=UTF-8
[junit4] FAILURE 65.9s | TestCloudPivotFacet.testDistribSearch <<<
[junit4] > Throwable #1: java.lang.AssertionError:
{main(facet=true&facet.pivot=pivot_tl%2Cpivot_tl%2Cpivot_y_s&facet.pivot=bogus_not_in_any_doc_s%2Cpivot_l1%2Cpivot_td&facet.limit=13&facet.missing=true&facet.sort=count&facet.overrequest.count=2),extra(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count)}
==> bogus_not_in_any_doc_s,pivot_l1,pivot_td:
{params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})}
expected:<7> but was:<9>
[junit4] > at
__randomizedtesting.SeedInfo.seed([775F7BCA685BBC22:F6B9F5D21F04DC1E]:0)
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:239)
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.doTest(TestCloudPivotFacet.java:187)
[junit4] > at
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:865)
[junit4] > at java.lang.Thread.run(Thread.java:744)
[junit4] > Caused by: java.lang.AssertionError:
bogus_not_in_any_doc_s,pivot_l1,pivot_td:
{params(rows=0),defaults({main({main(rows=0&q=*%3A*&fq=id%3A%5B*+TO+383%5D&_test_miss=true&_test_sort=count),extra(fq=-bogus_not_in_any_doc_s%3A%5B*+TO+*%5D)}),extra(fq=%7B%21term+f%3Dpivot_l1%7D5098)})}
expected:<7> but was:<9>
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.assertNumFound(TestCloudPivotFacet.java:507)
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:257)
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:268)
[junit4] > at
org.apache.solr.cloud.TestCloudPivotFacet.assertPivotCountsAreCorrect(TestCloudPivotFacet.java:229)
{noformat}
...i need to dig into this a bit more tommorow.
> Implement distributed pivot faceting
> ------------------------------------
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
> Issue Type: Improvement
> Reporter: Erik Hatcher
> Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch,
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch,
> SOLR-2894_cloud_test.patch, dateToObject.patch, pivot_mincount_problem.sh
>
>
> Following up on SOLR-792, pivot faceting currently only supports
> undistributed mode. Distributed pivot faceting needs to be implemented.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]