On 3/5/20 9:10 AM, Steven D'Aprano wrote:
On Thu, Mar 05, 2020 at 08:23:22AM -0500, Richard Damon wrote:

Yes, that is the idea of AlmostTotalOrder, to have algorithms that
really require a total order (like sorting)
Sorting doesn't require a total order. Sorting only requires a weak
order where the only operator required is the "comes before" operator,
or less than. That's precisely how sorting in Python is implemented.

Here is an interesting discussion of a practical use-case of sorting
data with a partial order:

https://blog.thecybershadow.net/2018/11/18/d-compilation-is-too-slow-and-i-am-forking-the-compiler/
Reading that, yes, there are applications of sorting that don't need total order, but as the article points out, many of the general purpose sorting algorithms do (like the one that Python uses in sort)


but we really need to use a
type that has these exceptional values. Imagine that sort/median was
defined to type check its parameter,
No need to imagine it, sort already type-checks its arguments:

     py> sorted([1, 3, 5, "Hello", 2])
     TypeError: '<' not supported between instances of 'str' and 'int'

If you consider that proper type checking, then you must consider that the proper answer for the median of a list of numbers that contain a NaN is any of the numbers in the list. If Sort had an easy/cheap way to confirm that values passed to it met its assumptions, then it could make are reasonable response.
and that meant that you couldn't
take the median of a list of floats (because float has the NaN value
that breaks TotalOrder).
Dealing with NANs depends on what you want to do with the data. If you
are sorting for presentation purposes, what you probably want is to sort
with a custom key that pushes all the NANs to the front (or rear) of the
list. If you are sorting for the purposes of calculating the median, it
depends. There are at least three reasonable strategies for median:

- ignore the NANs;
- return a NAN;
- raise an exception.

Personally, I think that the first is by far the most practical: if you
have NANs in your statistical data, that's probably because they've
come from some other library or application that is using them to
represent missing values, and if that's the case, the right thing to do
is to ignore them.

There was not that long ago about that very topic. All those options can be reasonable, but ignoring seems to me to be one of the worse options for a simple package (but reasonable for one where the whole package uses that convention). The danger of it is that if you get a NaN as a result of a computation generating your data, that error gets hidden by having the data just be ignored. I would say that in Python, it would make a lot more sense to use None as the missing data code, and leave NaN for invalid data/computations. That way you keep things explicit. The use of NaN here goes back to the use of strictly static typed languages for doing this, where NaN was a convenient special value to mark it. (prior to the invention of NaN you just used an impossible value for these).

--
Richard Damon
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/S2OY6FFW32JP2ACQFQ4645NGYP4ZZKQT/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to