> On Jun 30, 2014, at 9:59 AM, Gilles <gil...@harfang.homelinux.org> wrote: > > On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote: >>> On Jun 29, 2014, at 3:41 PM, Gilles <gil...@harfang.homelinux.org> wrote: >>> >>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote: >>>>>> On 6/29/14, 2:30 PM, Gilles wrote: >>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: >>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: >>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz >>>>>>> <phil.ste...@gmail.com> wrote: >>>>>>> >>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: >>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit : >>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe >>>>>>>>>> <l...@spaceroots.org> >>>>>>>> wrote: >>>>>>>>>>> Hi Venkat, >>>>>>>>>>> >>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit : >>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe >>>>>>>>>>>> <l...@spaceroots.org> >>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I >>>>>>>>>>>>> have found >>>>>>>>>>>>> another problem in the current implementation. >>>>>>>>>>>>> NaNStrategy.FIXED >>>>>>>> should >>>>>>>>>>>>> leave the NaNs in place, but at the end of the >>>>>>>>>>>>> KthSelector.select >>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of >>>>>>>>>>>>> the small >>>>>>>>>>>>> sub-array. What is really implemented here is therefore >>>>>>>>>>>>> closer to >>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always >>>>>>>>>>>>> occur in the >>>>>>>>>>>>> test cases because they use very short arrays, and we >>>>>>>>>>>>> directly >>>>>>>> switch to >>>>>>>>>>>>> this part of the select method. >>>>>>>>>>>> Are NaNs considered higher than +Inf ? >>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need >>>>>>>>>>>> something to >>>>>>>>>>>> indicate beyond this for NaN. >>>>>>>>>>> Well, we can just keep the NaN themselves and handled them >>>>>>>>>>> appropriately, hoping not to trash performances too much. >>>>>>>>>> Agreed. >>>>>>>>>> >>>>>>>>>>>> What is the test input you see an issue and what is the >>>>>>>>>>>> actual error >>>>>>>>>>>> you are seeing. Please share the test case. >>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The >>>>>>>>>>> first check in >>>>>>>>>>> the test corresponds to a Percentile configuration at 50% >>>>>>>>>>> percentile, >>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of >>>>>>>>>>> entries, so the >>>>>>>>>>> 50% percentile is exactly one of the entries: the one at >>>>>>>>>>> index 5 in the >>>>>>>>>>> final array. >>>>>>>>>>> >>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, >>>>>>>>>>> NaN, 8 }. So >>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified >>>>>>>>>>> at all in >>>>>>>>>>> the selection algorithm and the result for 50% should be the >>>>>>>>>>> first NaN >>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort, >>>>>>>>>>> we *do* >>>>>>>>>>> reorder the array into { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, >>>>>>>>>>> NaN }, so >>>>>>>>>>> the result is 5. >>>>>>>>>>> >>>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort >>>>>>>> function. >>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we >>>>>>>>>>> get as a >>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN. >>>>>>>>>>> >>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your >>>>>>>> insertionSort >>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for >>>>>>>>>> transformation >>>>>>>> of >>>>>>>>>> NaN to +/-Inf >>>>>>>>> I'm fine with it, as long as we clearly state it in the >>>>>>>>> NaNStrategy >>>>>>>>> documentation, saying somethig like: >>>>>>>>> >>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively >>>>>>>>> *replaced* by >>>>>>>>> +/- infinity, hence the results will be computed as if >>>>>>>>> infinities were >>>>>>>>> there in the first place. >>>>>>>> -1 - this is mixing concerns. NaNStrategy exists for one specific >>>>>>>> purpose - to turn extend the ordering on doubles a total ordering. >>>>>>>> Strategies prescribe only how NaNs are to be treated in the >>>>>>>> ordering. Side effects on the input array don't make sense in >>>>>>>> this >>>>>>>> context. The use case for which this class was created was rank >>>>>>>> transformations. Returning infinities in rank transformations >>>>>>>> would >>>>>>>> blow up computations in these classes. If a floating point >>>>>>>> computations need to transform NaNs, the specific stats / other >>>>>>>> classes that do this transformation should perform and document >>>>>>>> the >>>>>>>> behavior. >>>>>>>> >>>>>>>> Phil >>>>>>> OK. Agreed realized it later. Hence i have not touched >>>>>>> NaNStrategy in my >>>>>>> patch(MATH-1132) . Now i have added a separate transformer to >>>>>>> transform NaNs >>>>>>> but it uses NanStrategy. Please let me know on this as this >>>>>>> trasnformation >>>>>>> itself >>>>>>> can be used in different places. >>>>>> >>>>>> Honestly, I don't see the value of this. I would be more favorable >>>>>> to an addition to MathArrays supporting NaN (or infinity) filtering >>>>>> / transformation. Several years back we made the decision to just >>>>>> let NaNs propagate in computations, which basically means that if >>>>>> you feed NaNs to [math] you are going to get NaNs out in results. >>>>>> If we do get into the business of filtering NaNs (or infinities for >>>>>> that matter), I think it is probably best to do that in MathArrays >>>>>> or somesuch - i.e., don't decorate APIs with NaN or >>>>>> infinity-handling handling descriptions. That would be a huge task >>>>>> and would likely end up bleeding implementation detail into the >>>>>> APIs. As I said above, NaNStrategy has to exist for rank >>>>>> transformations to be well-defined. NaN-handling in floating point >>>>>> computations is defined and as long as we fully document what our >>>>>> algorithms do, I think it is fair enough to "let the NaNs fall where >>>>>> they may" - which basically means users need to do the filtering / >>>>>> transformation themselves. >>>>> >>>>> So, do I understand correctly that it's OK to add the "remove" and >>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"? >>>>> Or does this thread conclude that we don't have a use for this within >>>>> Commons Math? >>>> >>>> You raise a good point here. Personally, I don't see an internal >>>> use and if we are sticking with MathArrays including only things we >>>> have internal use for, I would say no, don't include them. >>> >>> A third way to look at it would be that since some algorithms require >>> being passed "clean" data (e.g. without NaNs), we could provide some >>> simple means to do so. A user could then write, e.g. >>> --- >>> double[] data = { 1, Double.NaN, 3, 6, 5 }; >>> >>> Percentile p = new Percentile(); >>> double q = p.evaluate(MathArrays.replace(data, Double.NaN, >>> Double.POSITIVE_INFINITY), 0.1); >>> --- >>> >>> Then, if we provide that utility, is it still necessary to complicate the >>> Percentile >>> code with a NaNStrategy parameter? >> >> I may be missing something, but it seems to me that percentile does >> need a NaNStrategy in the presence if NaNs like the other order >> statistics do (basically to define their position In >> the ordering). It doesn't logically need more than that. I would be >> ok defaulting the strategy or even returning NaN in the presence of >> NaNs (or whenever interpolation involves a NaN). All of this can be >> documented. > > I am _surely_ missing something because I don't figure out how NaNs can > be useful when computing those statistics... :-}
Good point. Basically the different strategies allow you to effectively ignore or work around them so stats can be well-defined. Without NaNStrategy, order statistics including NaNs are undefined. Phil > >> I still don't see a within-math use for recoding methods; though I do >> see the value from a user perspective. The problem is if we open this >> door, we risk MathArrays becoming a mini commons lang. > > If Commons Lang already provides a feature, then we indeed don't need > to offer the same within CM (if there is no internal use). > Can someone confirm? > > > Otherwise we could define a new inner interface in MathArrays, similar > to "MathArrays.Function": > --- > public class MathArrays { > > // ... > > public interface Transformer { > /** Transform the whole array */ > double[] transform(double[] input); > /** Transform the array in the interval [from, to) and return the > whole array */ > double[] transform(double[] input, int from, int to); > /** Transform the array in the interval [from, to) and return the > transformed sub-array */ > double[] transformAndSplice(double[] input, int from, int to); > } > } > --- > > Then we can provide "replace" and "remove" transformers through factory > methods > (example/untested code): > --- > public class MathArrays { > > // ... > > public static Transformer createReplaceTransformer(final double old, > final double replace) { > return new Transformer() { > public double[] transform(double[] input) { > final int len = input.length; > final double[] output = new double[len]; > for (int i = 0; i < len; i++) { > output[i] = Precision.equalsIncludingNaN(input[i], old) ? > replace : input[i]; > } > return output; > } > > // etc. > }; > } > } > --- > > > Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org