On 6/29/14, 9:48 AM, venkatesha murthy wrote: > On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > >> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: >>> Le 25/06/2014 06:05, venkatesha murthy a écrit : >>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <l...@spaceroots.org> >> wrote: >>>>> Hi Venkat, >>>>> >>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit : >>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <l...@spaceroots.org> >>>>> wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> While looking further in Percentile class for MATH-1120, I have found >>>>>>> another problem in the current implementation. NaNStrategy.FIXED >> should >>>>>>> leave the NaNs in place, but at the end of the KthSelector.select >>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the small >>>>>>> sub-array. What is really implemented here is therefore closer to >>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the >>>>>>> test cases because they use very short arrays, and we directly >> switch to >>>>>>> this part of the select method. >>>>>> Are NaNs considered higher than +Inf ? >>>>>> If MAXIMAL represent replacing for +inf ; you need something to >>>>>> indicate beyond this for NaN. >>>>> Well, we can just keep the NaN themselves and handled them >>>>> appropriately, hoping not to trash performances too much. >>>>> >>>> Agreed. >>>> >>>>>> What is the test input you see an issue and what is the actual error >>>>>> you are seeing. Please share the test case. >>>>> Just look at PercentileTest.testReplaceNanInRange(). The first check in >>>>> the test corresponds to a Percentile configuration at 50% percentile, >>>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the >>>>> 50% percentile is exactly one of the entries: the one at index 5 in the >>>>> final array. >>>>> >>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So >>>>> for the NaNStrategy.FIXED setting, it should not be modified at all in >>>>> the selection algorithm and the result for 50% should be the first NaN >>>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do* >>>>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so >>>>> the result is 5. >>>>> >>>>> Agreed. just verified by putting back the earlier insertionSort >> function. >>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a >>>>> result Double.POSITIVE_INFINITY instead of Double.NaN. >>>>> >>>>> If we agree to leave FIXED as unchanged behaviour with your >> insertionSort >>>> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation >> of >>>> NaN to +/-Inf >>> I'm fine with it, as long as we clearly state it in the NaNStrategy >>> documentation, saying somethig like: >>> >>> When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by >>> +/- infinity, hence the results will be computed as if infinities were >>> there in the first place. >> -1 - this is mixing concerns. NaNStrategy exists for one specific >> purpose - to turn extend the ordering on doubles a total ordering. >> Strategies prescribe only how NaNs are to be treated in the >> ordering. Side effects on the input array don't make sense in this >> context. The use case for which this class was created was rank >> transformations. Returning infinities in rank transformations would >> blow up computations in these classes. If a floating point >> computations need to transform NaNs, the specific stats / other >> classes that do this transformation should perform and document the >> behavior. >> >> Phil >> > OK. Agreed realized it later. Hence i have not touched NaNStrategy in my > patch(MATH-1132) . Now i have added a separate transformer to transform NaNs > but it uses NanStrategy. Please let me know on this as this trasnformation > itself > can be used in different places.
Honestly, I don't see the value of this. I would be more favorable to an addition to MathArrays supporting NaN (or infinity) filtering / transformation. Several years back we made the decision to just let NaNs propagate in computations, which basically means that if you feed NaNs to [math] you are going to get NaNs out in results. If we do get into the business of filtering NaNs (or infinities for that matter), I think it is probably best to do that in MathArrays or somesuch - i.e., don't decorate APIs with NaN or infinity-handling handling descriptions. That would be a huge task and would likely end up bleeding implementation detail into the APIs. As I said above, NaNStrategy has to exist for rank transformations to be well-defined. NaN-handling in floating point computations is defined and as long as we fully document what our algorithms do, I think it is fair enough to "let the NaNs fall where they may" - which basically means users need to do the filtering / transformation themselves. Phil > >>>>>>> When I implemented the method, I explicitly avoided calling >> Arrays.sort >>>>>>> because it is a general purpose method that is know to be efficient >> only >>>>>>> for arrays of at least medium size. In most cases, when the array is >>>>>>> small one falls back to a non-recursive method like a very simple >>>>>>> insertion sort, which is faster for smaller arrays. >>>>>> Please help me understand here; even java primitive Arrays.sort does >>>>>> an insertion sort for less than 7 elements >>>>>> (Refer sort1(double x[], int off, int len)) >>>>>> So what is it that the custom insertion sort that does differently or >>>>>> is advantageous. Is it the value 15 elements? >>>>> I don't see a reference to 7 elements, neither in the Java6 nor in the >>>>> Java 7 doc >>>> Please take a look at the sort1 method where there is a first block in >> the >>>> code which clearly mentions len < 7 >>>> /** >>>> * Sorts the specified sub-array of doubles into ascending order. >>>> */ >>>> private static void sort1(double x[], int off, int len) { >>>> // Insertion sort on smallest arrays >>>> if (len < 7) { >>>> for (int i=off; i<len+off; i++) >>>> for (int j=i; j>off && x[j-1]>x[j]; j--) >>>> swap(x, j, j-1); >>>> return; >>>> } >>>> : >>>> : >>>> : code continues for the else part >>>> >>>> Also the grepcode url >>>> < >> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29 >>>> indicates the same >>> OK, you were refering to a specific implementation. I was thinking in >>> the general case. >>> >>>> (and in any case the doc explicitly states the algorithms >>>>> explained are only implementation notes and are not part of the >>>>> specification). >>>>> >>>> Yes its a part of comments anyways. >>>> >>>>> However, the most important part for now is the fact that we control it >>>>> and may be able to implement different NaN strategies. What we have >>>>> currently fails. >>>>> >>>>> I agree on this and hence here is my take: >>>> Leave FIXED as-is and use the earlier insertionSort code (just change >> the >>>> name to sort rather than hardcoding it as insertionsort) to handle the >> case >>>> you were mentioning >>> If we leave FIXED it as it is done know, even with insertionSort we do >>> not really control what happens. It is deterministic as the pivoting and >>> if/else structure is well defined (both in the selection part and in the >>> final sort for sub-arrays), but it is untrackable so we can't document >> it. >>>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way >> we >>>> have covered both Inf and nan cases. >>> OK. >>> >>>> Use REMOVED as default for all Percentile Estimation Types. (mostly >>>> influenced by R here perhaps) >>> This would be fine with me. I have concerns with FIXED only, the other >>> strategies all seem quite realistic. >>> >>> Does anybody else has an advice for FIXED? What are the use case? >>> >>> best regards, >>> Luc >>> >>>> best regards, >>>>> Luc >>>>> >>>>>>> In the select >>>>>>> operation, we know we have small sub-arrays at the call point. Going >>>>>>> back to the former insertionSort would recover the good behavior for >>>>>>> small arrays, but would in fact not be sufficient to really >> implement a >>>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like >>>>>>> NaNStrategy.MAXIMAL but I did not try yet. >>>>>>> >>>>>>> My point here is rather: can we really and should we really implement >>>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to >>>>>>> store the original position of the NaNs. It is quite cumbersome. >>>>>>> >>>>>>> I wonder what is the use case for NaNStrategy.FIXED at all. >>>>>>> >>>>>>> Going further and looking at the use cases for other NaNStrategy >> values, >>>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for >>>>>>> ranking as we only rely on the indices, but we use the values >> themselves >>>>>>> in Percentile. So sometimes, we end up with computing interpolations >>>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and >>>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of >> the >>>>>>> NaN we should have retrieved without replacement. So here again, I'm >> not >>>>>>> sure we are doing the right thing. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> best regards, >>>>>>> Luc >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>>> >>>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org