[
https://issues.apache.org/jira/browse/SOLR-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-6349:
---------------------------
Attachment: make-data-and-queries.pl
SOLR-6349
I did a bit of crude benchmarking this morning with the following two uses
cases in mind:
* user currently asks for stats on fields, cares about all 8 of the stats
* user currently asks for stats on fields, only cares about 4of8 of them
the attached script shows my methodology -- it generates a CSV file with 10
million docs + 2 bash files that use curl to hit Solr with 300 *:* query urls
using randomly selected stats.field. the sequence of stat field requests are
identicle between the 2 bash files, but in one URLs include localparams to only
compute min/max/mean/stddev for the field.
Here's the results...
{noformat}
NOW BASELINE: 126.008 seconds (ie: all stats ... queries-old.sh)
PATCH ALL STATS: 133.571 seconds (6% slower ... queries-old.sh)
PATCH FOUR STATS: 130.515 seconds (3% slower ... queries-new.sh)
{noformat}
So not only has asking for all stats on a field gotten slower with this patch,
but even asking for only 4 of the 8 possible numeric stats on a field is still
slower then the existing code when all of them are returned.
A key thing to note here is that this is the total wall clock time from the
perspective of the client, including reading the response from Solr. Not only
are we (in theory) computing only only 1/2 as much math per request in the
"FOUR STATS" situation, the XML response size of each query is only ~3/4ths the
size of the original queryies. This should mean a lot less time both in
processing the results and in writing/reading the data over the wire ... and
yet instead of seeing some perf improvements, we see performance suffer.
I suspect a key factor here goes back to one of the concerns i mentioned
earlier...
{quote}
{code}
if (statsField.calculateStat(X)) {
X = calculateX()
}
{code}
...pattern you mentioned in so much code - that's one of the reasons i
abandomed my last patch (and before i abandoned it, i was focusingon trying to
ensure that it was at least always a comarison with a final boolean in the hops
that the JVM could optimize the if away)
{quote}
...the cumulative overhead of those method calls for every possible stat is
probably counter acting any gains made by reducing the stats that are computed.
----
My next step is to focus on fixing the current patch code so the few remaining
nocommit assertions in the test start passing (see earlier comments re
"min='false'") -- but once the behavior is locked down and solid i think we
really need to re-assess and re-factor the code to see some perf gains before
there's any point in moving towards adding this feature.
(NOTE: if anyone spots any flaws in my little mini-benchmark, please speak up
-- i would be very happy to be wrong)
> LocalParams for enabling/disabling individual stats
> ---------------------------------------------------
>
> Key: SOLR-6349
> URL: https://issues.apache.org/jira/browse/SOLR-6349
> Project: Solr
> Issue Type: Sub-task
> Reporter: Hoss Man
> Attachments: SOLR-6349-tflobbe.patch, SOLR-6349-tflobbe.patch,
> SOLR-6349-tflobbe.patch, SOLR-6349-xu.patch, SOLR-6349-xu.patch,
> SOLR-6349-xu.patch, SOLR-6349-xu.patch, SOLR-6349.patch, SOLR-6349.patch,
> SOLR-6349.patch, SOLR-6349.patch, SOLR-6349___bad_idea_broken.patch,
> make-data-and-queries.pl
>
>
> Stats component currently computes all stats (except for one) every time
> because they are relatively cheap, and in some cases dependent on eachother
> for distrib computation -- but if we start layering stats on other things it
> becomes unnecessarily expensive to compute all the stats when they just want
> the "sum" (and it will definitely become excessively verbose in the
> responses).
> The plan here is to use local params to make this configurable. All of the
> existing stat options could be modeled as a simple boolean param, but future
> params (like percentiles) might take in a more complex param value...
> Example:
> {noformat}
> stats.field={!min=true max=true percentiles='99,99.999'}price
> stats.field={!mean=true}weight
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]