[ 
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6349:
---------------------------
    Attachment: SOLR-6349___bad_idea_broken.patch


I've been trying to work on this off and on over the psat week and i keep 
running into problems.

the attached "SOLR-6349___bad_idea_broken.patch" shows the (lack of) "progress" 
made and should stand as a warning sign of the problems of going down this 
particular route.

Ultimately i kept running into 2 problems with trying to modify the existing 
code in AbstractStatsValues & its subclasses...

* parsing the localparams to know when the (legacy) default set of stats should 
be computed, vs specific individual stats
** this got very hairy very fast because of the superclass/subclass 
relationship -- each class needs properties to track what it's suppose to be 
computing (ie: "boolean needCount"), and the default initalization of those 
properties needs to be based on wether any specific propeties are requested 
(ie: "count" in localparams), but to do that properly it means the superclass 
can't init it's properties until giving the subclass a chance to check if any 
of it's specific properties have been requested.
* stats depending on other stats
** subclasses need to be able to override the init logic for the superclasses 
properties based on the subclass specific stats (ie: numeric "mean" and 
"stddev" stats both depends on the generic "count" stat) 
** there's a different between neding to _compute_ a stat locally (ie: 
count->mean) and wether we should return that stat to the caller.
*** for single node "mean" computation, we depend on the local computation of 
"sum" and "count" but we don't want to return either "sum" or "count"
*** for distributed "mean": each shard needs to compute & return "count" and 
"sum" but we don't need a per-shard "mean"; the coordinator needs to collect & 
combine all the (per-shard) "count" and "sum" stats to produce a "mean" that 
will be returned to the client, but it shouldn't return the combined "count" 
and "sum"

...which is why i ultimately abandoned this current patch.

----

I have a rough idea forming in the back of my head about a better way to solve 
this problem via a bit of an overhaul to the internals of AbstractStatsValues 
... trying to outline what i'm thinking...

* keep the basic contract of "StatsValues" intact
* keep the AbstractStatsValues and subclasses
** these should focus only on the differences in the data type of the _source_ 
data (ie: Number vs Date vs String vs Enum)
* refactor the meat of how each stat is computed into smaller "Stats" classes 
(ie: "StatsMean", "StatsSum", "StatsCount").
** these should be construct based on looping over the local params looking up 
the keys in some map (SPI?)
** each "Stat" can ask the StatsValues holding it to construct other dependent 
Stats as needed ("StatsMean" would ask for a StatsSum and StatsCount
*** if those already exist (because they were explicitly requested or because 
some other Stat also needed them) it would be given a ref to the existing 
instances, else StatsValues would create a new instance
** each "Stat" will have some boolean state indicating if it should write data 
in the response
*** StatsValues would set that state to true on a Stats instance only if it was 
explicitly requested via local param
*** Stats can forcible set that state to true on other stats, notably when the 
request "isShard" (ie: StatsMean can choose not to write anything to the 
response on an "isShard" request, but can tell the StatsSum and StatsCount 
objects that they must)


I'm going to sit on this for a few days, focus on other things, and then come 
back and revisit it with fresh eyes and see if i can think of anything better 
(or if anyone else comes along with a better suggestion)


> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
>                 Key: SOLR-6349
>                 URL: https://issues.apache.org/jira/browse/SOLR-6349
>             Project: Solr
>          Issue Type: Sub-task
>            Reporter: Hoss Man
>         Attachments: SOLR-6349___bad_idea_broken.patch
>
>
> Stats component currently computes all stats (except for one) every time 
> because they are relatively cheap, and in some cases dependent on eachother 
> for distrib computation -- but if we start layering stats on other things it 
> becomes unnecessarily expensive to compute all the stats when they just want 
> the "sum" (and it will definitely become excessively verbose in the 
> responses).  
> The plan here is to use local params to make this configurable.  All of the 
> existing stat options could be modeled as a simple boolean param, but future 
> params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to