Re: [math] use case for NaNStrategy.FIXED?

Gilles Sun, 29 Jun 2014 15:42:16 -0700

On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:

On 6/29/14, 2:30 PM, Gilles wrote:

On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:

On 6/29/14, 9:48 AM, venkatesha murthy wrote:

On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
<[email protected]> wrote:

On 6/25/14, 12:12 AM, Luc Maisonobe wrote:

Le 25/06/2014 06:05, venkatesha murthy a écrit :

On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
<[email protected]>

wrote:

Hi Venkat,

Le 23/06/2014 21:08, venkatesha murthy a écrit :

On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
<[email protected]>

wrote:

Hi all,

While looking further in Percentile class for MATH-1120, I
have found
another problem in the current implementation.
NaNStrategy.FIXED

should

leave the NaNs in place, but at the end of the
KthSelector.select
method, a call to Arrays.sort moves the NaNs to the end of
the small
sub-array. What is really implemented here is therefore
closer to
NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
occur in the
test cases because they use very short arrays, and we
directly

switch to

this part of the select method.

Are NaNs considered higher than +Inf ?
If MAXIMAL represent replacing for +inf ; you need
something to
indicate beyond this for NaN.

Well, we can just keep the NaN themselves and handled them
appropriately, hoping not to trash performances too much.

Agreed.

What is the test input you see an issue and what is the
actual error
you are seeing. Please share the test case.

Just look at PercentileTest.testReplaceNanInRange(). The
first check in
the test corresponds to a Percentile configuration at 50%
percentile,
and NaNStrategy.FIXED. The array has an odd number of
entries, so the
50% percentile is exactly one of the entries: the one at
index 5 in the
final array.

The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
NaN, 8 }. So
for the NaNStrategy.FIXED setting, it should not be modified
at all in
the selection algorithm and the result for 50% should be the
first NaN
of the array, at index 5. In fact, due to the Arrays.sort,
we *do*
reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
NaN }, so
the result is 5.

Agreed. just verified by putting back the earlierinsertionSort

function.

If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
get as a
result Double.POSITIVE_INFINITY instead of Double.NaN.

If we agree to leave FIXED as unchanged behaviour with your

insertionSort

code; then atleast MAXIMAL/MINIMAL should be allowed for
transformation

of

NaN to +/-Inf

I'm fine with it, as long as we clearly state it in the
NaNStrategy
documentation, saying somethig like:

 When MAXIMAL/MINIMAL are used, the NaNs are effecively
*replaced* by
 +/- infinity, hence the results will be computed as if
infinities were
 there in the first place.

-1 - this is mixing concerns. NaNStrategy exists for onespecificpurpose - to turn extend the ordering on doubles a totalordering.

Strategies prescribe only how NaNs are to be treated in the
ordering.  Side effects on the input array don't make sense in
this
context.  The use case for which this class was created was rank
transformations.  Returning infinities in rank transformations
would
blow up computations in these classes.  If a floating point
computations need to transform NaNs, the specific stats / other
classes that do this transformation should perform and document
the
behavior.

Phil

OK. Agreed realized it later.  Hence i have not touched
NaNStrategy in my
patch(MATH-1132) . Now i have added a separate transformer to
transform NaNs
but it uses NanStrategy.  Please let me know on this as this
trasnformation
itself
can be used in different places.


Honestly, I don't see the value of this.  I would be more favorable
to an addition to MathArrays supporting NaN (or infinity) filtering
/ transformation.  Several years back we made the decision to just
let NaNs propagate in computations, which basically means that if
you feed NaNs to [math] you are going to get NaNs out in results.
If we do get into the business of filtering NaNs (or infinities for
that matter), I think it is probably best to do that in MathArrays
or somesuch - i.e., don't decorate APIs with NaN or
infinity-handling handling descriptions.  That would be a huge task
and would likely end up bleeding implementation detail into the
APIs.  As I said above, NaNStrategy has to exist for rank
transformations to be well-defined.  NaN-handling in floating point
computations is defined and as long as we fully document what our

algorithms do, I think it is fair enough to "let the NaNs fallwhere

they may" - which basically means users need to do the filtering /
transformation themselves.


So, do I understand correctly that it's OK to add the "remove" and
"replace" utility methods (cf. MATH-1130) in "MathArrays"?

Or does this thread conclude that we don't have a use for thiswithin

Commons Math?


You raise a good point here.  Personally, I don't see an internal
use and if we are sticking with MathArrays including only things we
have internal use for, I would say no, don't include them.


A third way to look at it would be that since some algorithms require
being passed "clean" data (e.g. without NaNs), we could provide some
simple means to do so. A user could then write, e.g.
---
double[] data = { 1, Double.NaN, 3, 6, 5 };

Percentile p = new Percentile();

double q = p.evaluate(MathArrays.replace(data, Double.NaN,Double.POSITIVE_INFINITY), 0.1);

---

Then, if we provide that utility, is it still necessary to complicatethe Percentile

code with a NaNStrategy parameter?


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [math] use case for NaNStrategy.FIXED?

Reply via email to