Re: Treatment of NANs in the statistics module

Ben Finney Fri, 16 Mar 2018 22:45:08 -0700

Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes:

> I would like to ask people how they would prefer to handle [the
> computation of median when the data set contains NaN]:
>
> (1) Put the responsibility on the caller to strip NANs from their
> data. If there is a NAN in your data, the result of calling median()
> is implementation-defined.


This is the least Pythonic; there is no good reason IMO for specifying a
behaviour in the implementation.

> (2) Return a NAN.

I think this makes the most sense.

A result of ‘NaN’ communicates effectively that there is no meaningful
number which is the median of the data set.

It is then up to the caller to decide, based on that unambiguous result,
what to do about it.

> (3) Raise an exception.

To raise an exception might be justifiable if there were no better
option; but it creates the problem that a program which has not been
tested for data containing NaN will catastrophically fail, instead of
continuing.

To raise an exception also forecloses the decision of whether “NaN in
the data set” is a problem. That is, IMO, a decision for the caller to
make; it could well be that the caller is happy to use ‘NaN’ as the
result. That should IMO be up to the caller to decide.

> (4) median() should strip out NANs.

Too much magic.

This needlessly conflates different inputs as though they are the same.

> (5) All of the above, selected by the caller. (In which case, which
> would you prefer as the default?)

Please, no. The function should have one well-defined behaviour, and all
the options (raise an exception, strip the NaN from the input, etc.) can
be done in a wrapper function by those who want it done.


My vote, among those options, goes for “when the input contains ‘NaN’,
return ‘NaN’”.

-- 
 \       “But Marge, what if we chose the wrong religion? Each week we |
  `\          just make God madder and madder.” —Homer, _The Simpsons_ |
_o__)                                                                  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Treatment of NANs in the statistics module

Reply via email to