On Sat, Jun 2, 2018 at 7:31 PM, Paul Moore <p.f.mo...@gmail.com> wrote: > On 2 June 2018 at 08:32, Peter J. Holzer <hjp-pyt...@hjp.at> wrote: >> Browsing through older messages I came upon a thread discussing the >> treatment of NaNs by median(). Since you have to (partially) sort the >> values to compute the median, I played around with sorted(): >> >> Python 3.5.3 (default, Jan 19 2017, 14:11:04) >> [GCC 6.3.0 20170118] on linux >> Type "help", "copyright", "credits" or "license" for more information. >> >>>>> sorted([3, 5, float('NaN'), 1]) >> [3, 5, nan, 1] >> >> What? Does NaN cause sorted to return the original list? >> >>>>> sorted([3, 5, float('NaN'), 1, 0.5]) >> [3, 5, nan, 0.5, 1] >> >> Nope. Does it partition the list into sublists, which are sorted >> individually? >> >>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5]) >> [-8, 0.5, 1, 3, 5, nan] >>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33]) >> [-8, 0.5, 1, 3, 5, nan, 33] >> >> Also nope. It looks like NaNs just mess up sorting in an unpredictable >> way. Is this the intended behaviour or just an accident of >> implementation? (I think it's the latter: I can see how a sort algorithm >> which doesn't treat NaN specially would produce such results.) > > I'd simply assume it's the result of two factors: > > 1. The behaviour of comparisons involving NaN values is weird (not > undefined, as far as I know NaN behaviour is very well defined, but > violates a number of normally fundamental properties of comparisons) > 2. The precise behaviour of the sort algorithm use by Python is not > mandated by the language. > > A consequence of (1) is that there is no meaningful definition of "a > sorted list of numbers" if that list includes NaN, as the definition > "being sorted" relies on properties like "only one of a < b and b < a > is true" (precisely which properties depend on how you define "sorted" > which no-one normally cares about because under normal assumptions > they are all equivalent). A list including NaNs therefore cannot be > permuted into a sorted order (which is the basic language-mandated > detail of sorted() - there are others, but this is what matters here). > > So call it an accident of implementation of you like. Or "sorting a > list with NaNs in it is meaningless" if you prefer. Or "undefined > behaviour" if you're a fan of the language in the C standard.
Sorting a list of uncomparable objects is meaningless (though not undefined). I believe that a list of floats with some NaNs in it should have all the other values sort correctly, with the NaNs doing whatever they feel like doing (they're like cats - have you ever tried to sort an array of cats?); but that's hard to do with the vanilla sorting operation. If you want some sort of predictable behaviour, it's not too hard; all you need is a comparison function that puts all the NaNs together: >>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x!=x, x)) [-8, 0.5, 1, 3, 5, 33, nan] Or if you'd rather shove them all to the beginning: >>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x==x, x)) [nan, -8, 0.5, 1, 3, 5, 33] With that change, you have a guarantee that all finite and infinite values will be correctly sorted, and the NaNs will be grouped (but in no particular order within that group). ChrisA -- https://mail.python.org/mailman/listinfo/python-list