On Sat, Jun 2, 2018 at 7:31 PM, Paul Moore <p.f.mo...@gmail.com> wrote:
> On 2 June 2018 at 08:32, Peter J. Holzer <hjp-pyt...@hjp.at> wrote:
>> Browsing through older messages I came upon a thread discussing the
>> treatment of NaNs by median(). Since you have to (partially) sort the
>> values to compute the median, I played around with sorted():
>>
>> Python 3.5.3 (default, Jan 19 2017, 14:11:04)
>> [GCC 6.3.0 20170118] on linux
>> Type "help", "copyright", "credits" or "license" for more information.
>>
>>>>> sorted([3, 5, float('NaN'), 1])
>> [3, 5, nan, 1]
>>
>> What? Does NaN cause sorted to return the original list?
>>
>>>>> sorted([3, 5, float('NaN'), 1, 0.5])
>> [3, 5, nan, 0.5, 1]
>>
>> Nope. Does it partition the list into sublists, which are sorted
>> individually?
>>
>>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5])
>> [-8, 0.5, 1, 3, 5, nan]
>>>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33])
>> [-8, 0.5, 1, 3, 5, nan, 33]
>>
>> Also nope. It looks like NaNs just mess up sorting in an unpredictable
>> way. Is this the intended behaviour or just an accident of
>> implementation? (I think it's the latter: I can see how a sort algorithm
>> which doesn't treat NaN specially would produce such results.)
>
> I'd simply assume it's the result of two factors:
>
> 1. The behaviour of comparisons involving NaN values is weird (not
> undefined, as far as I know NaN behaviour is very well defined, but
> violates a number of normally fundamental properties of comparisons)
> 2. The precise behaviour of the sort algorithm use by Python is not
> mandated by the language.
>
> A consequence of (1) is that there is no meaningful definition of "a
> sorted list of numbers" if that list includes NaN, as the definition
> "being sorted" relies on properties like "only one of a < b and b < a
> is true" (precisely which properties depend on how you define "sorted"
> which no-one normally cares about because under normal assumptions
> they are all equivalent). A list including NaNs therefore cannot be
> permuted into a sorted order (which is the basic language-mandated
> detail of sorted() - there are others, but this is what matters here).
>
> So call it an accident of implementation of you like. Or "sorting a
> list with NaNs in it is meaningless" if you prefer. Or "undefined
> behaviour" if you're a fan of the language in the C standard.

Sorting a list of uncomparable objects is meaningless (though not
undefined). I believe that a list of floats with some NaNs in it
should have all the other values sort correctly, with the NaNs doing
whatever they feel like doing (they're like cats - have you ever tried
to sort an array of cats?); but that's hard to do with the vanilla
sorting operation. If you want some sort of predictable behaviour,
it's not too hard; all you need is a comparison function that puts all
the NaNs together:

>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x!=x, x))
[-8, 0.5, 1, 3, 5, 33, nan]

Or if you'd rather shove them all to the beginning:

>>> sorted([3, 5, -8, float('NaN'), 1, 0.5, 33], key=lambda x: (x==x, x))
[nan, -8, 0.5, 1, 3, 5, 33]

With that change, you have a guarantee that all finite and infinite
values will be correctly sorted, and the NaNs will be grouped (but in
no particular order within that group).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to