[
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-6349:
---------------------------
Attachment: make-data-and-queries.pl
Here's an updated version of make-data-and-queries.pl that:
* introduces some missing values into each field
* only hits port 8983 so singled node & cloud can be tested
* tests more diff types of individual stats and pairs of stats
My results are below, these numbers make me feel pretty good about the state of
the code (although i'm perplexed as to why "min" is consistently slower then
other stats that are more numerically complicated like mean & sum).
{noformat}
## single node (seconds)
trunk patch...
Run all all mean mean_stdv min sum sum_miss
1 168.22 165.99 112.89 114.39 137.12 107.83 108.01
2 167.63 166.31 113.28 114.29 136.58 107.72 108.63
3 168.83 167.45 112.73 114.50 132.84 109.06 108.91
total 504.68 499.75 338.90 343.18 406.54 324.61 325.55
## two node cloud (seconds)
trunk patch...
Run all all mean mean_stdv min sum sum_miss
1 115.66 115.32 70.27 73.57 90.72 68.06 68.97
2 111.68 111.16 70.27 71.20 89.62 68.82 67.53
3 112.16 112.85 70.38 70.47 91.48 68.39 70.00
total 339.50 339.33 210.92 215.24 271.82 205.27 206.50
{noformat}
...my next steps:
* another pass of code review (i kind of glossed over it when doing the
refactoring
* more tests of other field types
** current patch has detailed tests of individual stats on nmeric fields, but i
remember thinking the string,enum, and date code looked brittle and i supsect
they have bugs and NPEs when computing individual stats
* javadocs
* poke around "min" (and "max") and see if i can figure out why they are slow
(probably tangential to this issue, may punt)
> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
> Key: SOLR-6349
> URL: https://issues.apache.org/jira/browse/SOLR-6349
> Project: Solr
> Issue Type: Sub-task
> Reporter: Hoss Man
> Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch,
> SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch,
> SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch,
> SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch, SOLR-6349.patch,
> SOLR-6349___bad_idea_broken.patch, make-data-and-queries.pl,
> make-data-and-queries.pl, make-data-and-queries.pl
>
>
> Stats component currently computes all stats (except for one) every time
> because they are relatively cheap, and in some cases dependent on eachother
> for distrib computation -- but if we start layering stats on other things it
> becomes unnecessarily expensive to compute all the stats when they just want
> the "sum" (and it will definitely become excessively verbose in the
> responses).
> The plan here is to use local params to make this configurable. All of the
> existing stat options could be modeled as a simple boolean param, but future
> params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]