Hi Venkat,

Le 23/06/2014 21:08, venkatesha murthy a écrit :
> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org> wrote:
>> Hi all,
>>
>> While looking further in Percentile class for MATH-1120, I have found
>> another problem in the current implementation. NaNStrategy.FIXED should
>> leave the NaNs in place, but at the end of the KthSelector.select
>> method, a call to Arrays.sort moves the NaNs to the end of the small
>> sub-array. What is really implemented here is therefore closer to
>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>> test cases because they use very short arrays, and we directly switch to
>> this part of the select method.
> Are NaNs considered higher than +Inf ?
> If MAXIMAL represent replacing for +inf ; you need something to
> indicate beyond this for NaN.

Well, we can just keep the NaN themselves and handled them
appropriately, hoping not to trash performances too much.

> What is the test input you see an issue and what is the actual error
> you are seeing. Please share the test case.

Just look at PercentileTest.testReplaceNanInRange(). The first check in
the test corresponds to a Percentile configuration at 50% percentile,
and NaNStrategy.FIXED. The array has an odd number of entries, so the
50% percentile is exactly one of the entries: the one at index 5 in the
final array.

The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
for the NaNStrategy.FIXED setting, it should not be modified at all in
the selection algorithm and the result for 50% should be the first NaN
of the array, at index 5. In fact, due to the Arrays.sort, we *do*
reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
the result is 5.

If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
result Double.POSITIVE_INFINITY instead of Double.NaN.

>>
>> When I implemented the method, I explicitly avoided calling Arrays.sort
>> because it is a general purpose method that is know to be efficient only
>> for arrays of at least medium size. In most cases, when the array is
>> small one falls back to a non-recursive method like a very simple
>> insertion sort, which is faster for smaller arrays.
> 
> Please help me understand here; even java primitive Arrays.sort does
> an insertion sort for less than 7 elements
> (Refer sort1(double x[], int off, int len))
> So what is it that the custom insertion sort that does differently or
> is advantageous. Is it the value 15 elements?

I don't see a reference to 7 elements, neither in the Java6 nor in the
Java 7 doc (and in any case the doc explicitly states the algorithms
explained are only implementation notes and are not part of the
specification).

However, the most important part for now is the fact that we control it
and may be able to implement different NaN strategies. What we have
currently fails.

best regards,
Luc

> 
>> In the select
>> operation, we know we have small sub-arrays at the call point. Going
>> back to the former insertionSort would recover the good behavior for
>> small arrays, but would in fact not be sufficient to really implement a
>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>> NaNStrategy.MAXIMAL but I did not try yet.
>>
>> My point here is rather: can we really and should we really implement
>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>> store the original position of the NaNs. It is quite cumbersome.
>>
>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>
>> Going further and looking at the use cases for other NaNStrategy values,
>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>> ranking as we only rely on the indices, but we use the values themselves
>> in Percentile. So sometimes, we end up with computing interpolations
>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>> NaN we should have retrieved without replacement. So here again, I'm not
>> sure we are doing the right thing.
>>
>> What do you think?
>>
>> best regards,
>> Luc
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to