[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation

Richard Damon Thu, 05 Mar 2020 19:56:01 -0800

On 3/5/20 9:10 AM, Steven D'Aprano wrote:

On Thu, Mar 05, 2020 at 08:23:22AM -0500, Richard Damon wrote:

Yes, that is the idea of AlmostTotalOrder, to have algorithms that
really require a total order (like sorting)

Sorting doesn't require a total order. Sorting only requires a weak
order where the only operator required is the "comes before" operator,
or less than. That's precisely how sorting in Python is implemented.

Here is an interesting discussion of a practical use-case of sorting
data with a partial order:

https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/

Reading that, yes, there are applications of sorting that don't needtotal order, but as the article points out, many of the general purposesorting algorithms do (like the one that Python uses in sort)

but we really need to use a
type that has these exceptional values. Imagine that sort/median was
defined to type check its parameter,

No need to imagine it, sort already type-checks its arguments:

     py> sorted([1, 3, 5, "Hello", 2])
     TypeError: '<' not supported between instances of 'str' and 'int'

If you consider that proper type checking, then you must consider thatthe proper answer for the median of a list of numbers that contain a NaNis any of the numbers in the list. If Sort had an easy/cheap way toconfirm that values passed to it met its assumptions, then it could makeare reasonable response.

and that meant that you couldn't
take the median of a list of floats (because float has the NaN value
that breaks TotalOrder).

Dealing with NANs depends on what you want to do with the data. If you
are sorting for presentation purposes, what you probably want is to sort
with a custom key that pushes all the NANs to the front (or rear) of the
list. If you are sorting for the purposes of calculating the median, it
depends. There are at least three reasonable strategies for median:

- ignore the NANs;
- return a NAN;
- raise an exception.

Personally, I think that the first is by far the most practical: if you
have NANs in your statistical data, that's probably because they've
come from some other library or application that is using them to
represent missing values, and if that's the case, the right thing to do
is to ignore them.

There was not that long ago about that very topic. All those options canbe reasonable, but ignoring seems to me to be one of the worse optionsfor a simple package (but reasonable for one where the whole packageuses that convention). The danger of it is that if you get a NaN as aresult of a computation generating your data, that error gets hidden byhaving the data just be ignored. I would say that in Python, it wouldmake a lot more sense to use None as the missing data code, and leaveNaN for invalid data/computations. That way you keep things explicit.The use of NaN here goes back to the use of strictly static typedlanguages for doing this, where NaN was a convenient special value tomark it. (prior to the invention of NaN you just used an impossiblevalue for these).


--
Richard Damon
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/S2OY6FFW32JP2ACQFQ4645NGYP4ZZKQT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Magnitude and ProtoMagnitude ABCs — primarily for argument validation

Reply via email to