New submission from Oscar Benjamin:

As of issue20481, the statistics module for Python 3.4 will disallow any mixing 
of numeric types with the exception of int that can mix with any other type 
(but only one at a time). My understanding is that this change was not 
necessarily considered to be a permanent policy but rather a quick fix for 
Python 3.4 in order to explicitly prevent certain confusing situations arising 
from mixing Decimal with other stdlib numeric types.

issue20499 has a lot of discussion about different ways to improve accuracy and 
speed for the mean, variance etc. functions in the statistics module. It's 
tricky though to come up with a concrete implementation without having a clear 
specification for how the module should handle different numeric types.

There are several related issues to do with type handling. Should the 
statistics module
1) Use the same coercion rules as the numeric tower (pep-3141)?
2) Allow Decimal to mix with any types from the numeric tower?
3) Allow non-stdlib types that don't use the numeric tower?
4) Allow any mixing of types at all?
5) Strive to achieve the maximum possible accuracy for every type that it 
accepts?

I don't personally see much of a use-case for mixing e.g. Decimal and Fraction. 
I don't think it's unreasonable to require users to choose a numeric type and 
stick to it. The common cases will almost certainly be either all int or all 
float so those should be the main targets of any speed optimisation.

If a user is using Fraction/Decimal then they must have gone out of their way 
to do so and they may as well do so  consistently for all of their data. When 
choosing to use Fraction you do so because you want perfect accuracy. Mixing 
those Fractions with floating point types such as float and Decimal doesn't 
make any sense. Although there is a sense in which Decimals are also exact 
since they are always exact in their constructor. However I don't think there's 
any case where the Decimal constructor can be used but the Fraction constructor 
cannot so this mixing of types is unnecessary.

As with Fraction a user who chooses to use Decimal is going out of their way to 
do so because of the kind of accuracy guarantees that the type provides. It 
doesn't make any sense to mix these with floats that are inherently tainted 
with the wrong kind of rounding error. So mixing Decimal and float doesn't make 
any sense either.

Note that ordinary arithmetic prohibits the mixing of Decimal with 
Fraction/float so that on this point the statistics module is essentially 
maintaining a consistent position with respect to the policy of the Decimal 
type.

On the other hand ordinary arithmetic allows all of int, float, Fraction and 
complex and indeed any other type subscribing to the ABCs in the numeric tower 
to be mixed. As of issue20481 the statistics module does not allow any type 
mixing except for int:
http://hg.python.org/cpython/rev/5db74cd953ab
Note also that it uses type identity rather than subclass relationships or ABCs 
so that it is not even possible to mix e.g. float with a float subclass.

The most common case of mixing will almost certainly be int and float which 
will work. However I doubt that the current policy would be considered to be in 
keeping with Python's general policy on numeric types and anticipate that there 
will be a desire to change it in the future. The obvious candidate for a policy 
is the numeric tower and ABCs of PEP-3141. In that case the statistics module 
has a partial precedent on which to base its policy. The only tricky part is 
that Decimal is not part of the numeric tower. So there needs to be a special 
rule for Decimal such as "it only mixes with int/Integral".

Basing the policy on the numeric tower is attractive but it is worth noting 
that the std lib types int, float, Fraction and Decimal are the only types that 
actually implement and register with these ABCs. So it's not much different 
from saying that those particular types (and subclasses of) are accepted but I 
think that that is better than the current policy. 

Third party numeric types don't implement the interfaces described in PEP-3141. 
However one thing that is implemented by every third-party numeric type that I 
know of is __float__. So if there was to be a desire to support those in the 
statistics module then the simplest extension of the policy on types is to say 
that any non-numeric-tower types will simply be coerced with float. This still 
leaves the issue about how type mixing works there but, again, perhaps the 
safest option before the need arises is just to say that no type mixing is 
allowed if any input object is not from the numeric tower.

What do you think?

----------
components: Library (Lib)
messages: 210762
nosy: ncoghlan, oscarbenjamin, skrah, stevenjd, wolma
priority: normal
severity: normal
status: open
title: Type handling policy for the statistics module
type: enhancement
versions: Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue20575>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to