On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org> wrote: > Hi all, > > While looking further in Percentile class for MATH-1120, I have found > another problem in the current implementation. NaNStrategy.FIXED should > leave the NaNs in place, but at the end of the KthSelector.select > method, a call to Arrays.sort moves the NaNs to the end of the small > sub-array. What is really implemented here is therefore closer to > NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the > test cases because they use very short arrays, and we directly switch to > this part of the select method. Are NaNs considered higher than +Inf ? If MAXIMAL represent replacing for +inf ; you need something to indicate beyond this for NaN. What is the test input you see an issue and what is the actual error you are seeing. Please share the test case. > > When I implemented the method, I explicitly avoided calling Arrays.sort > because it is a general purpose method that is know to be efficient only > for arrays of at least medium size. In most cases, when the array is > small one falls back to a non-recursive method like a very simple > insertion sort, which is faster for smaller arrays.
Please help me understand here; even java primitive Arrays.sort does an insertion sort for less than 7 elements (Refer sort1(double x[], int off, int len)) So what is it that the custom insertion sort that does differently or is advantageous. Is it the value 15 elements? > In the select > operation, we know we have small sub-arrays at the call point. Going > back to the former insertionSort would recover the good behavior for > small arrays, but would in fact not be sufficient to really implement a > NaNStrategy.FIXED. I guess it would be simple to make it behave like > NaNStrategy.MAXIMAL but I did not try yet. > > My point here is rather: can we really and should we really implement > NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to > store the original position of the NaNs. It is quite cumbersome. > > I wonder what is the use case for NaNStrategy.FIXED at all. > > Going further and looking at the use cases for other NaNStrategy values, > the NaNs are replaced by +/- infinity before sorting, which is OK for > ranking as we only rely on the indices, but we use the values themselves > in Percentile. So sometimes, we end up with computing interpolations > like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and > x[k+1] has been changed to +infinity, we get +infinity, instead of the > NaN we should have retrieved without replacement. So here again, I'm not > sure we are doing the right thing. > > What do you think? > > best regards, > Luc > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org