On Wed, Jun 25, 2014 at 12:45 PM, Luc Maisonobe <l...@spaceroots.org> wrote:
> Le 25/06/2014 06:10, venkatesha murthy a écrit : > > On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy < > > venkateshamurth...@gmail.com> wrote: > > Can i put a patch for this change? > > I think we should still discuss about FIXED. > > If you want to set up u patch for the documentation of NaNStrategy > explaining the replacement done in MAXIMAL/MINIMAL, please go ahead. > You should put it in a new JIRA issue for clarity. > While documenting this handling of NaN serves right; i was wondering if we could *also* add a transform method in NaNStrategy that transforms an array with included NaNs to somthing that Maximal/Minimal/Removable wanted. Please let know > > best regards, > Luc > > > > >> > >> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <l...@spaceroots.org> > >> wrote: > >> > >>> Hi Venkat, > >>> > >>> Le 23/06/2014 21:08, venkatesha murthy a écrit : > >>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org> > >>> wrote: > >>>>> Hi all, > >>>>> > >>>>> While looking further in Percentile class for MATH-1120, I have found > >>>>> another problem in the current implementation. NaNStrategy.FIXED > should > >>>>> leave the NaNs in place, but at the end of the KthSelector.select > >>>>> method, a call to Arrays.sort moves the NaNs to the end of the small > >>>>> sub-array. What is really implemented here is therefore closer to > >>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the > >>>>> test cases because they use very short arrays, and we directly switch > >>> to > >>>>> this part of the select method. > >>>> Are NaNs considered higher than +Inf ? > >>>> If MAXIMAL represent replacing for +inf ; you need something to > >>>> indicate beyond this for NaN. > >>> > >>> Well, we can just keep the NaN themselves and handled them > >>> appropriately, hoping not to trash performances too much. > >>> > >> Agreed. > >> > >>> > >>>> What is the test input you see an issue and what is the actual error > >>>> you are seeing. Please share the test case. > >>> > >>> Just look at PercentileTest.testReplaceNanInRange(). The first check in > >>> the test corresponds to a Percentile configuration at 50% percentile, > >>> and NaNStrategy.FIXED. The array has an odd number of entries, so the > >>> 50% percentile is exactly one of the entries: the one at index 5 in the > >>> final array. > >>> > >>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So > >>> for the NaNStrategy.FIXED setting, it should not be modified at all in > >>> the selection algorithm and the result for 50% should be the first NaN > >>> of the array, at index 5. In fact, due to the Arrays.sort, we *do* > >>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so > >>> the result is 5. > >>> > >>> Agreed. just verified by putting back the earlier insertionSort > function. > >> > >> > >>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a > >>> result Double.POSITIVE_INFINITY instead of Double.NaN. > >>> > >>> If we agree to leave FIXED as unchanged behaviour with your > insertionSort > >> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation > of > >> NaN to +/-Inf > >> > >>> >> > >>>>> When I implemented the method, I explicitly avoided calling > Arrays.sort > >>>>> because it is a general purpose method that is know to be efficient > >>> only > >>>>> for arrays of at least medium size. In most cases, when the array is > >>>>> small one falls back to a non-recursive method like a very simple > >>>>> insertion sort, which is faster for smaller arrays. > >>>> > >>>> Please help me understand here; even java primitive Arrays.sort does > >>>> an insertion sort for less than 7 elements > >>>> (Refer sort1(double x[], int off, int len)) > >>>> So what is it that the custom insertion sort that does differently or > >>>> is advantageous. Is it the value 15 elements? > >>> > >>> I don't see a reference to 7 elements, neither in the Java6 nor in the > >>> Java 7 doc > >> > >> Please take a look at the sort1 method where there is a first block in > the > >> code which clearly mentions len < 7 > >> /** > >> * Sorts the specified sub-array of doubles into ascending order. > >> */ > >> private static void sort1(double x[], int off, int len) { > >> // Insertion sort on smallest arrays > >> if (len < 7) { > >> for (int i=off; i<len+off; i++) > >> for (int j=i; j>off && x[j-1]>x[j]; j--) > >> swap(x, j, j-1); > >> return; > >> } > >> : > >> : > >> : code continues for the else part > >> > >> Also the grepcode url > >> < > http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29 > > > >> indicates the same > >> > >> (and in any case the doc explicitly states the algorithms > >>> explained are only implementation notes and are not part of the > >>> specification). > >>> > >> Yes its a part of comments anyways. > >> > >>> > >>> However, the most important part for now is the fact that we control it > >>> and may be able to implement different NaN strategies. What we have > >>> currently fails. > >>> > >>> I agree on this and hence here is my take: > >> Leave FIXED as-is and use the earlier insertionSort code (just change > the > >> name to sort rather than hardcoding it as insertionsort) to handle the > case > >> you were mentioning > >> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way > we > >> have covered both Inf and nan cases. > >> Use REMOVED as default for all Percentile Estimation Types. (mostly > >> influenced by R here perhaps) > >> > >> best regards, > >>> Luc > >>> > >>>> > >>>>> In the select > >>>>> operation, we know we have small sub-arrays at the call point. Going > >>>>> back to the former insertionSort would recover the good behavior for > >>>>> small arrays, but would in fact not be sufficient to really > implement a > >>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like > >>>>> NaNStrategy.MAXIMAL but I did not try yet. > >>>>> > >>>>> My point here is rather: can we really and should we really implement > >>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to > >>>>> store the original position of the NaNs. It is quite cumbersome. > >>>>> > >>>>> I wonder what is the use case for NaNStrategy.FIXED at all. > >>>>> > >>>>> Going further and looking at the use cases for other NaNStrategy > >>> values, > >>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for > >>>>> ranking as we only rely on the indices, but we use the values > >>> themselves > >>>>> in Percentile. So sometimes, we end up with computing interpolations > >>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and > >>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of > the > >>>>> NaN we should have retrieved without replacement. So here again, I'm > >>> not > >>>>> sure we are doing the right thing. > >>>>> > >>>>> What do you think? > >>>>> > >>>>> best regards, > >>>>> Luc > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>>> For additional commands, e-mail: dev-h...@commons.apache.org > >>>> > >>>> > >>>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >>> For additional commands, e-mail: dev-h...@commons.apache.org > >>> > >>> > >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > >