Re: [math] use case for NaNStrategy.FIXED?

Phil Steitz Mon, 30 Jun 2014 10:16:38 -0700


> On Jun 30, 2014, at 9:59 AM, Gilles <gil...@harfang.homelinux.org> wrote:
> 
> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>> On Jun 29, 2014, at 3:41 PM, Gilles <gil...@harfang.homelinux.org> wrote:
>>> 
>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>>> <phil.ste...@gmail.com> wrote:
>>>>>>> 
>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>>> <l...@spaceroots.org>
>>>>>>>> wrote:
>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>> 
>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>>>> <l...@spaceroots.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>>>> have found
>>>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>> should
>>>>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>>>> KthSelector.select
>>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>>>> the small
>>>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>>>> closer to
>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>>>> occur in the
>>>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>>>> directly
>>>>>>>> switch to
>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>>>> something to
>>>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>>> Agreed.
>>>>>>>>>> 
>>>>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>>>>> actual error
>>>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>>>> first check in
>>>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>>>> percentile,
>>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>>>> entries, so the
>>>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>>>> index 5 in the
>>>>>>>>>>> final array.
>>>>>>>>>>> 
>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>>>> NaN, 8 }. So
>>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>>>> at all in
>>>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>>>> first NaN
>>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>>>> we *do*
>>>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>>>> NaN }, so
>>>>>>>>>>> the result is 5.
>>>>>>>>>>> 
>>>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>>>> function.
>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>>>> get as a
>>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>>> 
>>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>>>> insertionSort
>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>>>> transformation
>>>>>>>> of
>>>>>>>>>> NaN to +/-Inf
>>>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>>>> NaNStrategy
>>>>>>>>> documentation, saying somethig like:
>>>>>>>>> 
>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>>>> *replaced* by
>>>>>>>>> +/- infinity, hence the results will be computed as if
>>>>>>>>> infinities were
>>>>>>>>> there in the first place.
>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>>>> this
>>>>>>>> context.  The use case for which this class was created was rank
>>>>>>>> transformations.  Returning infinities in rank transformations
>>>>>>>> would
>>>>>>>> blow up computations in these classes.  If a floating point
>>>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>>>> classes that do this transformation should perform and document
>>>>>>>> the
>>>>>>>> behavior.
>>>>>>>> 
>>>>>>>> Phil
>>>>>>> OK. Agreed realized it later.  Hence i have not touched
>>>>>>> NaNStrategy in my
>>>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>>>> transform NaNs
>>>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>>>> trasnformation
>>>>>>> itself
>>>>>>> can be used in different places.
>>>>>> 
>>>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>>>> / transformation.  Several years back we made the decision to just
>>>>>> let NaNs propagate in computations, which basically means that if
>>>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>>>> If we do get into the business of filtering NaNs (or infinities for
>>>>>> that matter), I think it is probably best to do that in MathArrays
>>>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>>>> infinity-handling handling descriptions.  That would be a huge task
>>>>>> and would likely end up bleeding implementation detail into the
>>>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>>>> transformations to be well-defined.  NaN-handling in floating point
>>>>>> computations is defined and as long as we fully document what our
>>>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>>>> they may" - which basically means users need to do the filtering /
>>>>>> transformation themselves.
>>>>> 
>>>>> So, do I understand correctly that it's OK to add the "remove" and
>>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>>>> Or does this thread conclude that we don't have a use for this within
>>>>> Commons Math?
>>>> 
>>>> You raise a good point here.  Personally, I don't see an internal
>>>> use and if we are sticking with MathArrays including only things we
>>>> have internal use for, I would say no, don't include them.
>>> 
>>> A third way to look at it would be that since some algorithms require
>>> being passed "clean" data (e.g. without NaNs), we could provide some
>>> simple means to do so. A user could then write, e.g.
>>> ---
>>> double[] data = { 1, Double.NaN, 3, 6, 5 };
>>> 
>>> Percentile p = new Percentile();
>>> double q = p.evaluate(MathArrays.replace(data, Double.NaN, 
>>> Double.POSITIVE_INFINITY), 0.1);
>>> ---
>>> 
>>> Then, if we provide that utility, is it still necessary to complicate the 
>>> Percentile
>>> code with a NaNStrategy parameter?
>> 
>> I may be missing something, but it seems to me that percentile does
>> need a NaNStrategy in the presence if NaNs like the other order
>> statistics do (basically to define their position In
>> the ordering).  It doesn't logically need more than that.  I would be
>> ok defaulting the strategy or even returning NaN in the presence of
>> NaNs (or whenever interpolation involves a NaN). All of this can be
>> documented.
> 
> I am _surely_ missing something because I don't figure out how NaNs can
> be useful when computing those statistics... :-}


Good point.  Basically the different strategies allow you to effectively ignore 
or work around them so stats can be well-defined.  Without NaNStrategy, order 
statistics including NaNs are undefined.  

Phil

> 
>> I still don't see a within-math use for recoding methods; though I do
>> see the value from a user perspective.  The problem is if we open this
>> door, we risk MathArrays becoming a mini commons lang.
> 
> If Commons Lang already provides a feature, then we indeed don't need
> to offer the same within CM (if there is no internal use).
> Can someone confirm?
> 
> 
> Otherwise we could define a new inner interface in MathArrays, similar
> to "MathArrays.Function":
> ---
> public class MathArrays {
> 
>    // ...
> 
>    public interface Transformer {
>        /** Transform the whole array */
>        double[] transform(double[] input);
>        /** Transform the array in the interval [from, to) and return the 
> whole array */
>        double[] transform(double[] input, int from, int to);
>        /** Transform the array in the interval [from, to) and return the 
> transformed sub-array */
>        double[] transformAndSplice(double[] input, int from, int to);
>    }
> }
> ---
> 
> Then we can provide "replace" and "remove" transformers through factory 
> methods
> (example/untested code):
> ---
> public class MathArrays {
> 
>    // ...
> 
>    public static Transformer createReplaceTransformer(final double old,
>                                                       final double replace) {
>        return new Transformer() {
>            public double[] transform(double[] input) {
>                final int len = input.length;
>                final double[] output = new double[len];
>                for (int i = 0; i < len; i++) {
>                    output[i] = Precision.equalsIncludingNaN(input[i], old) ?
>                        replace : input[i];
>                }
>                return output;
>            }
> 
>            // etc.
>        };
>    }
> }
> ---
> 
> 
> Gilles
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] use case for NaNStrategy.FIXED?

Reply via email to