[Python-ideas] Re: Fix statistics.median()?

Andrew Barnert via Python-ideas Thu, 26 Dec 2019 16:57:33 -0800

On Dec 26, 2019, at 15:27, Marco Sulla via Python-ideas 
<[email protected]> wrote:
> 
> Andrew Barnert wrote:
>> On Dec 26, 2019, at 12:19, Marco Sulla via Python-ideas 
>> [email protected] wrote:
>> you can get the behavior of your algorithm below:
>> @functools.cmp_to_key
>> def flip_incomparables_key(a, b):
>>    if a < b: return -1
>>    if b < a: return 1
>>    return 1
> 
> ^_____^ we arrived to the same conclusion:
> https://mail.python.org/archives/list/[email protected]/message/LM22TNILN2INCZVIPXIUEOKHZ5RJKS5Q/
> 
> But you're key is better! It's not only for NaNs, and I don't know why you 
> say it does not do exactly what I wanted.


    >>> a = [1, nan, 3, nan, 2, inf, 0, nan]
    >>> sorted(a, key=flip_incomparables_key)
    [1, nan, 3, nan, 0, 2, inf, nan]

What you wanted is [0, 1, 2, 3, inf, nan, nan, nan], and it does not do exactly 
that, or even anything remotely similar.

If you tweak it, you can make something that would be correct if sorted were a 
bubble sort, but still isn’t correct with the actual timsort algorithm, or for 
quicksort or anything else you’d ever want to use.

On the other hand, my shift_nan function—or, even better, Chris’s—does do 
exactly what you want, obviously works with any valid sorting algorithm, and is 
explicit about why it does so: it treats nan as bigger than any other value, so 
of course all the nans end up at the right.

> Wait, I'll push them until they are incomparable. I always check for "<=" for 
> every element, I don't mark them as "incomparable" the first time and stop.
> 
> I think we can simply apply to sorted you hack key, with a little mod:
> 
> ```
> @functools.cmp_to_key
> def iliadSort(a, b):
>    if a <= b: return -1
>    if b <= a: return 0
>    return 1
> 
> ```
> 
> I'll test it.

Using <= Instead of < doesn’t fix anything that couldn’t be fixed by the == 
clause (except maybe for types where a<=b iff a<b or a==b is false, and <= is 
reasonable but < is silly, but that’s not true for float or set or any other 
reasonable type).

This does fix the stability problem. Extend a with 5, 5.0 and the original 
version will end with 5.0, 5, inf but this version will end with 5, 5.0, inf. 
But it doesn’t fix anything else. And you could have fixed the stability 
problem just by changing one character in the original version; there’s no need 
to switch to <=.

The root problem is the whole idea of “push them until they’re comparable”. 
That can’t work unless you’re testing each value against every potentially 
incomparable value to the right, which is inherently quadratic. You have to 
come up with a rule for incomparable values that depends on nothing but the two 
values—a rule like “nan is bigger” (as long as you get the stability right) 
does that; a rule like “left is bigger” does not.

_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/JEOA6VRGKYQW4LLTIHSOV37YTMGL5NDX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Fix statistics.median()?

Reply via email to