On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy < venkateshamurth...@gmail.com> wrote: Can i put a patch for this change?
> > On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <l...@spaceroots.org> > wrote: > >> Hi Venkat, >> >> Le 23/06/2014 21:08, venkatesha murthy a écrit : >> > On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org> >> wrote: >> >> Hi all, >> >> >> >> While looking further in Percentile class for MATH-1120, I have found >> >> another problem in the current implementation. NaNStrategy.FIXED should >> >> leave the NaNs in place, but at the end of the KthSelector.select >> >> method, a call to Arrays.sort moves the NaNs to the end of the small >> >> sub-array. What is really implemented here is therefore closer to >> >> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the >> >> test cases because they use very short arrays, and we directly switch >> to >> >> this part of the select method. >> > Are NaNs considered higher than +Inf ? >> > If MAXIMAL represent replacing for +inf ; you need something to >> > indicate beyond this for NaN. >> >> Well, we can just keep the NaN themselves and handled them >> appropriately, hoping not to trash performances too much. >> > Agreed. > >> >> > What is the test input you see an issue and what is the actual error >> > you are seeing. Please share the test case. >> >> Just look at PercentileTest.testReplaceNanInRange(). The first check in >> the test corresponds to a Percentile configuration at 50% percentile, >> and NaNStrategy.FIXED. The array has an odd number of entries, so the >> 50% percentile is exactly one of the entries: the one at index 5 in the >> final array. >> >> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So >> for the NaNStrategy.FIXED setting, it should not be modified at all in >> the selection algorithm and the result for 50% should be the first NaN >> of the array, at index 5. In fact, due to the Arrays.sort, we *do* >> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so >> the result is 5. >> >> Agreed. just verified by putting back the earlier insertionSort function. > > >> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a >> result Double.POSITIVE_INFINITY instead of Double.NaN. >> >> If we agree to leave FIXED as unchanged behaviour with your insertionSort > code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of > NaN to +/-Inf > >> >> >> >> When I implemented the method, I explicitly avoided calling Arrays.sort >> >> because it is a general purpose method that is know to be efficient >> only >> >> for arrays of at least medium size. In most cases, when the array is >> >> small one falls back to a non-recursive method like a very simple >> >> insertion sort, which is faster for smaller arrays. >> > >> > Please help me understand here; even java primitive Arrays.sort does >> > an insertion sort for less than 7 elements >> > (Refer sort1(double x[], int off, int len)) >> > So what is it that the custom insertion sort that does differently or >> > is advantageous. Is it the value 15 elements? >> >> I don't see a reference to 7 elements, neither in the Java6 nor in the >> Java 7 doc > > Please take a look at the sort1 method where there is a first block in the > code which clearly mentions len < 7 > /** > * Sorts the specified sub-array of doubles into ascending order. > */ > private static void sort1(double x[], int off, int len) { > // Insertion sort on smallest arrays > if (len < 7) { > for (int i=off; i<len+off; i++) > for (int j=i; j>off && x[j-1]>x[j]; j--) > swap(x, j, j-1); > return; > } > : > : > : code continues for the else part > > Also the grepcode url > <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29> > indicates the same > > (and in any case the doc explicitly states the algorithms >> explained are only implementation notes and are not part of the >> specification). >> > Yes its a part of comments anyways. > >> >> However, the most important part for now is the fact that we control it >> and may be able to implement different NaN strategies. What we have >> currently fails. >> >> I agree on this and hence here is my take: > Leave FIXED as-is and use the earlier insertionSort code (just change the > name to sort rather than hardcoding it as insertionsort) to handle the case > you were mentioning > Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we > have covered both Inf and nan cases. > Use REMOVED as default for all Percentile Estimation Types. (mostly > influenced by R here perhaps) > > best regards, >> Luc >> >> > >> >> In the select >> >> operation, we know we have small sub-arrays at the call point. Going >> >> back to the former insertionSort would recover the good behavior for >> >> small arrays, but would in fact not be sufficient to really implement a >> >> NaNStrategy.FIXED. I guess it would be simple to make it behave like >> >> NaNStrategy.MAXIMAL but I did not try yet. >> >> >> >> My point here is rather: can we really and should we really implement >> >> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to >> >> store the original position of the NaNs. It is quite cumbersome. >> >> >> >> I wonder what is the use case for NaNStrategy.FIXED at all. >> >> >> >> Going further and looking at the use cases for other NaNStrategy >> values, >> >> the NaNs are replaced by +/- infinity before sorting, which is OK for >> >> ranking as we only rely on the indices, but we use the values >> themselves >> >> in Percentile. So sometimes, we end up with computing interpolations >> >> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and >> >> x[k+1] has been changed to +infinity, we get +infinity, instead of the >> >> NaN we should have retrieved without replacement. So here again, I'm >> not >> >> sure we are doing the right thing. >> >> >> >> What do you think? >> >> >> >> best regards, >> >> Luc >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> > For additional commands, e-mail: dev-h...@commons.apache.org >> > >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> >