On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
On Jun 29, 2014, at 3:41 PM, Gilles <gil...@harfang.homelinux.org> wrote:

On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
On 6/29/14, 2:30 PM, Gilles wrote:
On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
On 6/29/14, 9:48 AM, venkatesha murthy wrote:
On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
<phil.ste...@gmail.com> wrote:

On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
Le 25/06/2014 06:05, venkatesha murthy a écrit :
On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
<l...@spaceroots.org>
wrote:
Hi Venkat,

Le 23/06/2014 21:08, venkatesha murthy a écrit :
On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
<l...@spaceroots.org>
wrote:
Hi all,

While looking further in Percentile class for MATH-1120, I
have found
another problem in the current implementation.
NaNStrategy.FIXED
should
leave the NaNs in place, but at the end of the
KthSelector.select
method, a call to Arrays.sort moves the NaNs to the end of
the small
sub-array. What is really implemented here is therefore
closer to
NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
occur in the
test cases because they use very short arrays, and we
directly
switch to
this part of the select method.
Are NaNs considered higher than +Inf ?
If MAXIMAL represent replacing for +inf ; you need
something to
indicate beyond this for NaN.
Well, we can just keep the NaN themselves and handled them
appropriately, hoping not to trash performances too much.
Agreed.

What is the test input you see an issue and what is the
actual error
you are seeing. Please share the test case.
Just look at PercentileTest.testReplaceNanInRange(). The
first check in
the test corresponds to a Percentile configuration at 50%
percentile,
and NaNStrategy.FIXED. The array has an odd number of
entries, so the
50% percentile is exactly one of the entries: the one at
index 5 in the
final array.

The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
NaN, 8 }. So
for the NaNStrategy.FIXED setting, it should not be modified
at all in
the selection algorithm and the result for 50% should be the
first NaN
of the array, at index 5. In fact, due to the Arrays.sort,
we *do*
reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
NaN }, so
the result is 5.

Agreed. just verified by putting back the earlier insertionSort
function.
If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
get as a
result Double.POSITIVE_INFINITY instead of Double.NaN.

If we agree to leave FIXED as unchanged behaviour with your
insertionSort
code; then atleast MAXIMAL/MINIMAL should be allowed for
transformation
of
NaN to +/-Inf
I'm fine with it, as long as we clearly state it in the
NaNStrategy
documentation, saying somethig like:

When MAXIMAL/MINIMAL are used, the NaNs are effecively
*replaced* by
+/- infinity, hence the results will be computed as if
infinities were
there in the first place.
-1 - this is mixing concerns. NaNStrategy exists for one specific purpose - to turn extend the ordering on doubles a total ordering.
Strategies prescribe only how NaNs are to be treated in the
ordering.  Side effects on the input array don't make sense in
this
context. The use case for which this class was created was rank
transformations.  Returning infinities in rank transformations
would
blow up computations in these classes.  If a floating point
computations need to transform NaNs, the specific stats / other
classes that do this transformation should perform and document
the
behavior.

Phil
OK. Agreed realized it later.  Hence i have not touched
NaNStrategy in my
patch(MATH-1132) . Now i have added a separate transformer to
transform NaNs
but it uses NanStrategy.  Please let me know on this as this
trasnformation
itself
can be used in different places.

Honestly, I don't see the value of this. I would be more favorable to an addition to MathArrays supporting NaN (or infinity) filtering / transformation. Several years back we made the decision to just
let NaNs propagate in computations, which basically means that if
you feed NaNs to [math] you are going to get NaNs out in results.
If we do get into the business of filtering NaNs (or infinities for that matter), I think it is probably best to do that in MathArrays
or somesuch - i.e., don't decorate APIs with NaN or
infinity-handling handling descriptions. That would be a huge task
and would likely end up bleeding implementation detail into the
APIs.  As I said above, NaNStrategy has to exist for rank
transformations to be well-defined. NaN-handling in floating point
computations is defined and as long as we fully document what our
algorithms do, I think it is fair enough to "let the NaNs fall where they may" - which basically means users need to do the filtering /
transformation themselves.

So, do I understand correctly that it's OK to add the "remove" and
"replace" utility methods (cf. MATH-1130) in "MathArrays"?
Or does this thread conclude that we don't have a use for this within
Commons Math?

You raise a good point here.  Personally, I don't see an internal
use and if we are sticking with MathArrays including only things we
have internal use for, I would say no, don't include them.

A third way to look at it would be that since some algorithms require
being passed "clean" data (e.g. without NaNs), we could provide some
simple means to do so. A user could then write, e.g.
---
double[] data = { 1, Double.NaN, 3, 6, 5 };

Percentile p = new Percentile();
double q = p.evaluate(MathArrays.replace(data, Double.NaN, Double.POSITIVE_INFINITY), 0.1);
---

Then, if we provide that utility, is it still necessary to complicate the Percentile
code with a NaNStrategy parameter?

I may be missing something, but it seems to me that percentile does
need a NaNStrategy in the presence if NaNs like the other order
statistics do (basically to define their position In
the ordering).  It doesn't logically need more than that.  I would be
ok defaulting the strategy or even returning NaN in the presence of
NaNs (or whenever interpolation involves a NaN). All of this can be
documented.

I am _surely_ missing something because I don't figure out how NaNs can
be useful when computing those statistics... :-}

I still don't see a within-math use for recoding methods; though I do
see the value from a user perspective. The problem is if we open this
door, we risk MathArrays becoming a mini commons lang.

If Commons Lang already provides a feature, then we indeed don't need
to offer the same within CM (if there is no internal use).
Can someone confirm?


Otherwise we could define a new inner interface in MathArrays, similar
to "MathArrays.Function":
---
public class MathArrays {

    // ...

    public interface Transformer {
        /** Transform the whole array */
        double[] transform(double[] input);
/** Transform the array in the interval [from, to) and return the whole array */
        double[] transform(double[] input, int from, int to);
/** Transform the array in the interval [from, to) and return the transformed sub-array */
        double[] transformAndSplice(double[] input, int from, int to);
    }
}
---

Then we can provide "replace" and "remove" transformers through factory methods
(example/untested code):
---
public class MathArrays {

    // ...

public static Transformer createReplaceTransformer(final double old, final double replace) {
        return new Transformer() {
            public double[] transform(double[] input) {
                final int len = input.length;
                final double[] output = new double[len];
                for (int i = 0; i < len; i++) {
output[i] = Precision.equalsIncludingNaN(input[i], old) ?
                        replace : input[i];
                }
                return output;
            }

            // etc.
        };
    }
}
---


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to