[math] speeding up percentile based statistics

Luc Maisonobe Fri, 17 Sep 2010 10:35:36 -0700

Hi all,

During a recent face to face discussion with some commons-math users
from a large project, it appeared one implementation choice for
Percentile statistics was a huge performance bottleneck.


When the Percentile.evaluate method is called, the sample array is
copied and sorted. In one use case, 100 calls were made for each sample,
so the same array was sorted 100 times. In another use case, only one
call is made (typically to compute the median) but the user wondered why
the array should be completely sorted. In fact, using a selection
algorithm instead of a sorting algorithm would be sufficient.

I would like to have you opinion about providing a different evaluate
method that would process an array provided previously and use only
selections to provide the percentile. Consider for example the following
input array:

  [ -6, 8, 4, -5, 0, 2, 7 ]

If we first as for the median, a selection algorithm may reorganize the
array as:

  [ -6, 0, -5, 2, 8, 4, 7 ]

were the left part is smaller than the central element, the right part
is larger and hence the central element 2 is known to be the median.

Then we ask for another value, the 25% percentile. Since the object
already knows the smaller elements are in the left part, it can use a
select algorithm on the 3 elements left part and extract the value -5
without even trying to sort the upper part of the array.

What do you think ?

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

[math] speeding up percentile based statistics

Reply via email to