Hi all, Le 01/07/2014 08:52, Luc Maisonobe a écrit : > Le 30/06/2014 22:00, Phil Steitz a écrit : >> On 6/30/14, 12:33 PM, Luc Maisonobe wrote: >>> Le 30/06/2014 19:15, Phil Steitz a écrit : >>>> >>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gil...@harfang.homelinux.org> >>>>> wrote: >>>>> >>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote: >>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles >>>>>>> <gil...@harfang.homelinux.org> wrote: >>>>>>> >>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote: >>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote: >>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: >>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun, >>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz >>>>>>>>>>> <phil.ste...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le >>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit : >>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe >>>>>>>>>>>>>> <l...@spaceroots.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> Hi Venkat, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit >>>>>>>>>>>>>>> : >>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc >>>>>>>>>>>>>>>> Maisonobe <l...@spaceroots.org> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> While looking further in Percentile class >>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem >>>>>>>>>>>>>>>>> in the current implementation. >>>>>>>>>>>>>>>>> NaNStrategy.FIXED >>>>>>>>>>>> should >>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of >>>>>>>>>>>>>>>>> the KthSelector.select method, a call to >>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of >>>>>>>>>>>>>>>>> the small sub-array. What is really >>>>>>>>>>>>>>>>> implemented here is therefore closer to >>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. >>>>>>>>>>>>>>>>> This always occur in the test cases because >>>>>>>>>>>>>>>>> they use very short arrays, and we >>>>>>>>>>>>>>>>> directly >>>>>>>>>>>> switch to >>>>>>>>>>>>>>>>> this part of the select method. >>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If >>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you >>>>>>>>>>>>>>>> need something to indicate beyond this for >>>>>>>>>>>>>>>> NaN. >>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and >>>>>>>>>>>>>>> handled them appropriately, hoping not to trash >>>>>>>>>>>>>>> performances too much. >>>>>>>>>>>>>> Agreed. >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What is the test input you see an issue and >>>>>>>>>>>>>>>> what is the actual error you are seeing. >>>>>>>>>>>>>>>> Please share the test case. >>>>>>>>>>>>>>> Just look at >>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The >>>>>>>>>>>>>>> first check in the test corresponds to a >>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and >>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number >>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly >>>>>>>>>>>>>>> one of the entries: the one at index 5 in the >>>>>>>>>>>>>>> final array. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, >>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the >>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be >>>>>>>>>>>>>>> modified at all in the selection algorithm and >>>>>>>>>>>>>>> the result for 50% should be the first NaN of >>>>>>>>>>>>>>> the array, at index 5. In fact, due to the >>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into { >>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the >>>>>>>>>>>>>>> result is 5. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Agreed. just verified by putting back the >>>>>>>>>>>>>>> earlier insertionSort >>>>>>>>>>>> function. >>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile >>>>>>>>>>>>>>> above 67%, we get as a result >>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of >>>>>>>>>>>>>>> Double.NaN. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged >>>>>>>>>>>>>>> behaviour with your >>>>>>>>>>>> insertionSort >>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be >>>>>>>>>>>>>> allowed for transformation >>>>>>>>>>>> of >>>>>>>>>>>>>> NaN to +/-Inf >>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in >>>>>>>>>>>>> the NaNStrategy documentation, saying somethig >>>>>>>>>>>>> like: >>>>>>>>>>>>> >>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are >>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the >>>>>>>>>>>>> results will be computed as if infinities were >>>>>>>>>>>>> there in the first place. >>>>>>>>>>>> -1 - this is mixing concerns. NaNStrategy exists for >>>>>>>>>>>> one specific purpose - to turn extend the ordering on >>>>>>>>>>>> doubles a total ordering. Strategies prescribe only >>>>>>>>>>>> how NaNs are to be treated in the ordering. Side >>>>>>>>>>>> effects on the input array don't make sense in this >>>>>>>>>>>> context. The use case for which this class was >>>>>>>>>>>> created was rank transformations. Returning >>>>>>>>>>>> infinities in rank transformations would blow up >>>>>>>>>>>> computations in these classes. If a floating point >>>>>>>>>>>> computations need to transform NaNs, the specific >>>>>>>>>>>> stats / other classes that do this transformation >>>>>>>>>>>> should perform and document the behavior. >>>>>>>>>>>> >>>>>>>>>>>> Phil >>>>>>>>>>> OK. Agreed realized it later. Hence i have not >>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have >>>>>>>>>>> added a separate transformer to transform NaNs but it >>>>>>>>>>> uses NanStrategy. Please let me know on this as this >>>>>>>>>>> trasnformation itself can be used in different places. >>>>>>>>>> Honestly, I don't see the value of this. I would be more >>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or >>>>>>>>>> infinity) filtering / transformation. Several years back >>>>>>>>>> we made the decision to just let NaNs propagate in >>>>>>>>>> computations, which basically means that if you feed NaNs >>>>>>>>>> to [math] you are going to get NaNs out in results. If we >>>>>>>>>> do get into the business of filtering NaNs (or infinities >>>>>>>>>> for that matter), I think it is probably best to do that >>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs >>>>>>>>>> with NaN or infinity-handling handling descriptions. >>>>>>>>>> That would be a huge task and would likely end up >>>>>>>>>> bleeding implementation detail into the APIs. As I said >>>>>>>>>> above, NaNStrategy has to exist for rank transformations >>>>>>>>>> to be well-defined. NaN-handling in floating point >>>>>>>>>> computations is defined and as long as we fully document >>>>>>>>>> what our algorithms do, I think it is fair enough to "let >>>>>>>>>> the NaNs fall where they may" - which basically means >>>>>>>>>> users need to do the filtering / transformation >>>>>>>>>> themselves. >>>>>>>>> So, do I understand correctly that it's OK to add the >>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in >>>>>>>>> "MathArrays"? Or does this thread conclude that we don't >>>>>>>>> have a use for this within Commons Math? >>>>>>>> You raise a good point here. Personally, I don't see an >>>>>>>> internal use and if we are sticking with MathArrays including >>>>>>>> only things we have internal use for, I would say no, don't >>>>>>>> include them. >>>>>>> A third way to look at it would be that since some algorithms >>>>>>> require being passed "clean" data (e.g. without NaNs), we could >>>>>>> provide some simple means to do so. A user could then write, >>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 }; >>>>>>> >>>>>>> Percentile p = new Percentile(); double q = >>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN, >>>>>>> Double.POSITIVE_INFINITY), 0.1); --- >>>>>>> >>>>>>> Then, if we provide that utility, is it still necessary to >>>>>>> complicate the Percentile code with a NaNStrategy parameter? >>>>>> I may be missing something, but it seems to me that percentile >>>>>> does need a NaNStrategy in the presence if NaNs like the other >>>>>> order statistics do (basically to define their position In the >>>>>> ordering). It doesn't logically need more than that. I would >>>>>> be ok defaulting the strategy or even returning NaN in the >>>>>> presence of NaNs (or whenever interpolation involves a NaN). All >>>>>> of this can be documented. >>>>> I am _surely_ missing something because I don't figure out how NaNs >>>>> can be useful when computing those statistics... :-} >>>> Good point. Basically the different strategies allow you to >>>> effectively ignore or work around them so stats can be well-defined. >>>> Without NaNStrategy, order statistics including NaNs are undefined. >>> The problem with Percentile is that the elements are both reordered >>> *and* used when interpolating between two values. Tor the rank methods, >>> only reordering was needed so the various NaNStrategies could be >>> implemented by replacing at will with +/- infinity and a classical sort >>> would do the ordering for us. >>> >>> Here, we do interpolate on the values, so if we first replace NaN with >>> +infinity to put them at the end, we may interpolate between a finite >>> value and +inf and have +inf as a result, whereas if the NaN "value" was >>> preserved and some magic used to sort the NaN, we would end up >>> interpolating with a NaN and get a NaN as a result. Setting up the magic >>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already >>> have a specific sort (in fact not a sort but a selection algorithm, >>> which is the core of the class). Setting up the magic for FIXED seems >>> more difficult. >>> >>> So I guess implementing NaNStrategies may be done in different ways >>> depending on how they will be used by the calling algorithm. I don't yet >>> know if we can set up a general method to be used for all statistics. >>> Looking only at Percentile, replacement with +/- infinity already seems >>> to not be an appropriate solution. >> >> I don't think we should try to redefine algorithms so they can >> handle NaNs. When you do floating point computations with NaNs, you >> get NaNs. That should be understood to be the contract. For rank >> statistics, such as Percentile, we need to extend the ordering on >> doubles to include them in ordering for the algorithm to be >> well-defined. If we subsequently do floating point computations >> with the NaNs, the result should be NaN. I think we should either >> just simplify things and say we *always* return NaN whenever NaNs >> are present for Percentile, or do the following: >> >> 0) use the prescribed NaN strategy to determine the ordering - so >> NaNs end up either at the bottom, at the top or in their original >> position (FIXED). >> 1) if interpolation involves a NaN element, the result returned is NaN. > > This is exactly what I say. If NaNs are in, NaNs should be out. As for > now we do replace them, we get infinities instead of NaNs. I would > prefer getting NaNs there as it is more consistent with the purpose of > NaNs in floating point encoding. > > So I think we agree: NaNs should not be changed, but they should be > reordered. Perhaps we could have something simple like NaNStrategy > having a method: > > boolean shouldSwap(int i1, double d1, int i2, double i2) > > that would use both the index and values of the element to check if they > should be swapped or not. This method would be used both for traditional > sorting for ranking and in selection algorithms for percentile. I first > thought we should simply have NaN implement Comparator<Double> but this > would not work for FIXED, whereas the above method would work (simply > returning false as soon as there is a NaN, regardless of it being at a > lower or upper index). > > I will take a look at this option.
This does not work as is. The problem is lack of transitivity. Suppose that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN. Then if during the sort/selection we compare elements at indices i and k first, we will swap them, which is correct behaviour regardless of the NaN strategy used. If we compare elemntes at indices i and j first and then elements at indices j and k and we use FIXED NaNStrategy, then we won't swap the two elements and the middle NaN would induced a non-globally sorted array. I think FIXED should act as a river flowing over rocks : the water (non-NaN elements) would move freely around and be sorted, and the rocks (NaN elements) would stay were they are but do not prevent water flow. I have found another way to handle this, using an indirection array that could "jump" over NaNs for FIXED strategy or push them to either end for MINIMAL or MAXIMAL strategy. I will try this in the next few days, as time permit. best regards, Luc > > best regards, > Luc > >> >> Phil >>> >>> best regards, >>> Luc >>> >>>> >>>> Phil >>>> >>>>>> I still don't see a within-math use for recoding methods; though >>>>>> I do see the value from a user perspective. The problem is if we >>>>>> open this door, we risk MathArrays becoming a mini commons lang. >>>>> If Commons Lang already provides a feature, then we indeed don't >>>>> need to offer the same within CM (if there is no internal use). Can >>>>> someone confirm? >>>>> >>>>> >>>>> Otherwise we could define a new inner interface in MathArrays, >>>>> similar to "MathArrays.Function": --- public class MathArrays { >>>>> >>>>> // ... >>>>> >>>>> public interface Transformer { /** Transform the whole array */ >>>>> double[] transform(double[] input); /** Transform the array in the >>>>> interval [from, to) and return the whole array */ double[] >>>>> transform(double[] input, int from, int to); /** Transform the >>>>> array in the interval [from, to) and return the transformed >>>>> sub-array */ double[] transformAndSplice(double[] input, int from, >>>>> int to); } } --- >>>>> >>>>> Then we can provide "replace" and "remove" transformers through >>>>> factory methods (example/untested code): --- public class >>>>> MathArrays { >>>>> >>>>> // ... >>>>> >>>>> public static Transformer createReplaceTransformer(final double >>>>> old, final double replace) { return new Transformer() { public >>>>> double[] transform(double[] input) { final int len = input.length; >>>>> final double[] output = new double[len]; for (int i = 0; i < len; >>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? >>>>> replace : input[i]; } return output; } >>>>> >>>>> // etc. }; } } --- >>>>> >>>>> >>>>> Gilles >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> >>>>> >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>> --------------------------------------------------------------------- >>>> >>>> >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>> For additional commands, e-mail: dev-h...@commons.apache.org >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org