On 16 August 2013 17:31, <chris.bar...@noaa.gov> wrote: >> > I am seeking comments on PEP 450, Adding a statistics module to Python's > > The trick here is that numpy really is the "right" way to do this stuff.
Although it doesn't mention this in the PEP, a significant point that is worth bearing in mind is that numpy is only for CPython, not PyPy, IronPython, Jython etc. See here for a recent update on the status of NumPyPy: http://morepypy.blogspot.co.uk/2013_08_01_archive.html > I like to say: > "crunching numbers in python without numpy is like doing text processing > without using the string object" It depends what kind of number crunching you're doing. Numpy gives efficient C-style number crunching but it doesn't really give efficient ways to take advantage of the areas where Python is better than C such as having efficient infinite range integers, and decimal and rational arithmetic in the standard library. You can use dtype=object to use all these things with numpy arrays but in my experience this is typically not faster than working with Python lists and is only really useful when you want numpy's multi-dimensional, view-type slicing. Here's an example where Steven's statistics module is more accurate: >>> numpy.mean([-1e60, 100, 100, 1e60]) 0.0 >>> statistics.mean([-1e60, 100, 100, 1e60]) 50.0 Okay so that's a toy example but it illustrates that Steven is aiming for ultra-high accuracy where numpy is primarily aimed at speed. He's also tried to ensure that it works properly with e.g. fractions: >>> from fractions import Fraction as F >>> data = [F('1/7'), F('3/7')] >>> numpy.mean(data) 0.2857142857142857 >>> statistics.mean(data) Fraction(2, 7) and decimals: >>> data = [D('0.1'), D('0.01'), D('0.001')] >>> numpy.mean(data) .... TypeError: unsupported operand type(s) for /: 'decimal.Decimal' and 'float' >>> statistics.mean(data) Decimal('0.037') > What this is really an argument for is a numpy-lite in the standard library, > which could be used to build these sorts of things on. But that's been > rejected before... If it's a numpy-lite then it's a numpy-ultra-lite. It really doesn't provide much of what numpy provides. I would describe it as a Pythonic implementation of elementary statistical computation rather than a numpy-lite. [snip] > > All that being said -- if you do decide to do this, please use a PEP 3118 > (enhanced buffer) supporting data type (probably array.array) -- > compatibility with numpy and other packages for crunching numbers is very > nice. > > If someone decides to build a stand-alone stats package -- building it on a > ndarray-lite (PEP 3118 compatible) object would be a nice way to go. Why? Yes I'd also like an ndarray-lite or rather an ultra-lite 1-dimensional version but why would it be useful for the statistics module over using standard Python containers? Note that numpy arrays do work with the reference implementation of the statistics module (they're just treated as iterables): >>> import numpy >>> import statistics >>> statistics.mean(numpy.array([1, 2, 3])) 2.0 >>> statistics.mean(numpy.array([[1, 2, 3], [4, 5, 6]])) array([ 2.5, 3.5, 4.5]) > One other point -- for performance reason, is would be nice to have some > compiled code in there -- this adds incentive to put it in the stdlib -- > external packages that need compiling is what makes numpy unacceptable to > some folks. It might be good to have a C accelerator one day but actually I think the pure-Python-ness of it is a strong reason to have it since it provides accurate statistics functions to all Python implementations (unlike numpy) at no additional cost. Oscar -- http://mail.python.org/mailman/listinfo/python-list