FWIW, although no one cares, I "withdraw" my proposed implementation.
While it bugs me that I'm not sure what error I made in dealing with
duplicate values in an iterable, on reflection I think the whole idea is
wrong.
That is, I don't like the weirdness of the behavior of statistics.median.
But what I guard against in my partitioning approach isn't every possible
comparison of two items anyway. That would always take quadratic time. I
just do a bunch of such comparisons according to some particular program
flow, but not everything. "Incomparability" can be a property of any pair
of objects, in principle.
However, I also realize the completely general question is irrelevant.
NaNs really are just special in arising innocuously from relatively normal
numeric operations. If I make some custom class IncomparableToEverything,
it's my problem if I stick it in a list of things I want the median of.
So we could get the Pandas-style behavior simply by calling median like so:
statistics.median((x for x in it if not math.isnan(x)))
I still feel like having median (and friends) do that internally would be
worthwhile under some optional parameter. But the default value of that
parameter is indeed non-obvious. In a sort of Pandas way of using
arguments, we might get `on_nan=["skip"|"poison"|"raise"|"random"]`.
"Random" seems like the only wrong answer, but it is the status quo.
On Thu, Dec 26, 2019 at 4:34 PM David Mertz <[email protected]> wrote:
> FWIW, here is a timing:
>
> >>> many_nums = [randint(10, 100) for _ in range(1_000_000)]
> >>> %timeit statistics.median_low(many_nums)
> 87.2 ms ± 654 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> >>> %timeit median(many_nums)
> 282 ms ± 3.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> I think almost all the slowdown is because `sorted()` is a C function. In
> big-O terms, mine should be an improvement since it does part of a
> Quicksort in partitioning elements, but it doesn't actually bother sorting
> the smaller partition. It *does* make one pass through to find the min
> or max though.
>
--
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons. Intellectual property is
to the 21st century what the slave trade was to the 16th.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/42GTSIJ6HBGDFTSUMMZDSANFVCHJEIZC/
Code of Conduct: http://python.org/psf/codeofconduct/