> On Dec 30, 2019, at 08:55, David Mertz <[email protected]> wrote:
>> Presumably the end user (unlike the statistics module) knows what data they
>> have.
>
> No, Steven is right here. In Python we might very sensibly mix numeric
> datatypes.
The statistics module explicitly doesn’t support doing so. Which means anyone
who’s doing it anyway is into “experienced user” territory, and ought to know
what they’re doing.
At any rate, I wasn’t arguing that we don’t need a NaN test function in
statistics. My point—lost by snipping off all the context—was nearly the
opposite. The fact that you can NaN-filter things yourself (more easily than
the statistics module can) doesn’t mean the module shouldn’t offer an ignore
option—and therefore, the fact that you can DSU things yourself (less easily
than using a key function) doesn’t mean the module shouldn’t offer a key
parameter.
(There may be other good arguments against a key parameter. The fact that all
three of the alternate orders anyone’s asked for or suggested turned out to be
spurious, and nobody can think of a good use for a different one, that’s a
pretty good argument that YAGNI. But that doesn’t make the bogus argument from
“theoretically you could do it yourself so we don’t need to offer it no matter
how useful” any less bogus.)
> But this means we need an `is_nan()` function like some discussed in these
> threads, not rely on a method (and not the same behavior as math.isnan()).
Wait, what’s wrong with the behavior of math.isnan for floats? If you want a
NaN test that differs from the one defined by IEEE, I think we’re off into
uncharted waters.
Let’s get concrete: say we have a function that tries the method, and, on
exception, tries math for floats, returns false for other Numbers, and finally
raises a TypeError if all of the above failed. (If this were a general thing
rather than a statistics thing, add trying cmath too.)
What values of what types does that not serve? People keep trying to come up
with “better” NaN tests than the obvious one, but better for what? If you don’t
have an actual problem to solve, what use is a solution, no matter how clever?
> E.g.:
>
> my_data = {'observation1': 10**400, # really big amount
> 'observation2': 1, # ordinary size
> 'observation3': 2.0, # ordinary size
> 'observation4': math.nan # missing data }
>
> median = statistics.median_high(x for x in my_data if not is_nan(x))
>
> The answer '2.0' is plainly right here, and there's no reason we shouldn't
> provide it.
Wait, are you arguing that we should just offer a generic is_nan function (as a
builtin?), instead of adding an on_nan handler parameter to median and friends?
If so, apologies; I guess I was disagreeing with someone else’s very different
position above, not yours.
This helps users who are sophisticated enough to intentionally use NaNs for
missing data, and to know they want to filter them out of a median, and to know
how to do that with a genexpr, and to know when you can and can’t safely ignore
the docs on which inputs are supported by statistics, but not sophisticated
enough to write an isnan test for their mix of two types. But do any such users
exist?
Writing a NaN test that works for your values even though you intentionally
mixed two types isn’t the hard part. It’s knowing what to do with that NaN test.
Which still isn’t all that hard, but it’s something a lot of novices haven’t
learned yet. I think there are a lot more users of the statistics module who
would be helped by raise and ignore options on median than by just giving them
the simple tools to build that behavior themselves and hoping they figure out
that they need to.
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/SDGMTW6LUQBWGB6JYTPKDVQP4D6IVULX/
Code of Conduct: http://python.org/psf/codeofconduct/