(math] Percentile performance improvement

Luc Maisonobe Sun, 10 Oct 2010 08:15:02 -0700

Hi all,

The new implementation of Percentile improving performances has been
checked in. It should solve issue
<https://issues.apache.org/jira/browse/MATH-417>.


As discussed here, the implementation is based on a selection algorithm
instead of a complete sort. A setData method has also been added to the
AbstractUnivariateStatistic base class to allow caching some work
between calls when several percentiles are requested. Both the
partitioned data array and the first levels of pivots are cached. For
now, I have set the limit to 10 levels of pivots, but this can be
changed (the memory cost is a (2^n)-1 integer array for n levels).

>From the few performances tests I have done, the improvements are really
huge and depend on both the array size and the number of percentiles
desired. Compared with a single cached sort followed by direct array
accesses, the selection-based implementation is still about twice faster
up to a few hundreds percentiles for array sizes of a few millions
elements. For a single percentile, I saw up to a few hundreds times
faster computation.

Leanne, could you check this suits your needs ?

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

(math] Percentile performance improvement

Reply via email to