Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> writes: > I would like to ask people how they would prefer to handle [the > computation of median when the data set contains NaN]: > > (1) Put the responsibility on the caller to strip NANs from their > data. If there is a NAN in your data, the result of calling median() > is implementation-defined.
This is the least Pythonic; there is no good reason IMO for specifying a behaviour in the implementation. > (2) Return a NAN. I think this makes the most sense. A result of ‘NaN’ communicates effectively that there is no meaningful number which is the median of the data set. It is then up to the caller to decide, based on that unambiguous result, what to do about it. > (3) Raise an exception. To raise an exception might be justifiable if there were no better option; but it creates the problem that a program which has not been tested for data containing NaN will catastrophically fail, instead of continuing. To raise an exception also forecloses the decision of whether “NaN in the data set” is a problem. That is, IMO, a decision for the caller to make; it could well be that the caller is happy to use ‘NaN’ as the result. That should IMO be up to the caller to decide. > (4) median() should strip out NANs. Too much magic. This needlessly conflates different inputs as though they are the same. > (5) All of the above, selected by the caller. (In which case, which > would you prefer as the default?) Please, no. The function should have one well-defined behaviour, and all the options (raise an exception, strip the NaN from the input, etc.) can be done in a wrapper function by those who want it done. My vote, among those options, goes for “when the input contains ‘NaN’, return ‘NaN’”. -- \ “But Marge, what if we chose the wrong religion? Each week we | `\ just make God madder and madder.” —Homer, _The Simpsons_ | _o__) | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list