Re: [math] speeding up percentile based statistics

Mikkel Meyer Andersen Sun, 26 Sep 2010 10:28:22 -0700

2010/9/26 Gilles Sadowski <gil...@harfang.homelinux.org>:
>> [...]
>>
>>  1) do nothing to check the array is the same between calls and blindly
>>     assumes it IS the same. Users would really need to call clearCache
>>     when they provide a new array
>>     pros: very simple
>>     cons: error-prone for the user as it relies only on reading and
>>     understanding a documentation that would change with new version
>>
>>  2) check only array reference equality (with ==) to check the array
>>     is the same as the previous call or not. Users would need to call
>>     clearCache only if they use the same array but have changed its
>>     content.
>>     pros: trade-off between efficiency and reliability,
>>           handles most cases properly
>>     cons: may be wrong in corner cases
>>
>>  3) check array content using an hash code. Users would generally don't
>>     need to call clearCache at all so it would not be provided
>>     pros: works in all cases
>>     cons: add linear cost for array checking
>>
>>  4) remove the double[] values parameter from the API and use a separate
>>     addValues method, hence the class will reset its cache only when
>>     addValues is called
>>     pros: works in all cases
>>     cons: Percentile could not implement UnivariateStatistic anymore
>>
>> My preference is choice 2.
>>
>> What do you think ?
>
> IIUC, the interface method ("evaluate") was designed around the assumption
> that successive calls are independent: the array argument is not
> encapsulated for further processing (e.g. caching).
> IMO, the equality check and the "clearCache" method look like workarounds,
> not a clean solution.
> As an example, what if a user calls "evaluate" several times alternating on
> two different arrays:
>
> ---CUT---
>    double[] a = new double[] {1, 2, 3};
>    double[] b = new double[] {1, 2, 3, 4, 5};
>    Percentile p = new Percentile();
>
>    dourble r;
>    r = p.evaluate(a, 50);
>    r = p.evaluate(b, 50);
>    r = p.evaluate(a, 50);
>    r = p.evaluate(b, 50);
>
>    // etc.
> ---CUT---
>
> Doesn't this kind of use-case nullify the expected optimization?
>
> If the array is going to be reused, the user call should reflect that fact;
> e.g. by passing the array in a constructor:
>
> ---CUT---
>    double[] a = new double[] {1, 2, 3};
>    double[] b = new double[] {1, 2, 3, 4, 5};
>    Percentile pA = new Percentile(a);
>    Percentile pB = new Percentile(b);
>
>    double r;
>    r = pA.evaluate(50);
>    r = pB.evaluate(50);
>    r = pA.evaluate(50);
>    r = pB.evaluate(50);
> ---CUT---
>
> That way, later calls can benefit from whatever preprocessing was done in
> previous calls.
> The instance will always control all the information needed (e.g. after a
> call to an "addValues" method) for the processing without the need to rely
> on the user for calling "clearCache" whenever necessary.
>
>
> Gilles


+1
I think that is a really good idea and I agree on the points made.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] speeding up percentile based statistics

Reply via email to