On Friday, August 16, 2013 11:51:49 AM UTC-7, Steven D'Aprano wrote: > > The trick here is that numpy really is the "right" way to do this stuff.
> Numpy does not have a monopoly on the correct algorithms for statistics > functions, indeed not -- in fact, a number of them are quite lame, either because of chosen speed vs. accuracy trade offs, or just plain no-one-got-around-to-writing-the-code. I kind of mis-spoke: what I meant was: "a numpy ndarray-similar object is the "right"way to do this", not numpy itself. > and a big, heavyweight library like numpy is overkill for many > lightweight statistics tasks. One shouldn't need to turn on a nuclear > reactor just to put the light on in your fridge. sure -- but you are talking stdlib here -- where do we draw the line? a hard choice every time. > > "crunching numbers in python without numpy is like doing text processing > > without using the string object" > > Your analogy is backwards. String objects actually aren't optimal for > heavy duty text processing, because they're immutable. If you're serious > about crunching vast amounts of numbers, you'll use numpy. If you're > serious about crunch vast amounts of text, say for a text editor or word > processor, you *won't* use strings, you'll use some sort of mutable > buffer, or ropes, or some other data type. But very unlikely to use > strings. but you sure as heck won't use arbitrary pyton sequences of characters. which is what you are doing with this module. > > What this is really an argument for is a numpy-lite in the standard > > library, which could be used to build these sorts of things on. But > > that's been rejected before... > "Numpy-lite". Which parts of numpy? Who maintains it? The numpy release > schedule is nothing like the standard library's release schedule, so > which one has to change? Or does somebody fork numpy, giving two > independent code bases? yup -- that's why it's been rejected before -- but we did get PEP 3118 as a compromise, so one could build an nd-array-lite that was PEP 3118 compatible, and avoid many of the problems above. However, as much a problem is is to install a third-party compiled package, it's a hell of a lot less work than writing a bunch of new code, so it'll probably never get done. I myself am trying to write my new stuff to take PEP 3118 buffers, so I can get full high-performing numpy support, but not require users to have numpy -- it is a bit tricky, but can be done. If/when you get to the C-accelerated version, I suggest you consider it. > What about Jython, IronPython, and other Python implementations? Even > PyPy doesn't support numpy yet, and Jython and IronPython probably never > will, since they're not C-based. There is a numpy for IronPython, though I don't hink it got beyond the alpha stage. But your point is well taken -- but also a reason for an ndarray in the stdlib, then maybe other implementations would support it. > Yeah, right, sure it will be. I've been waiting a decade for package > management on Linux to become painless, and it still isn't. There's no > reason to expect pip will be more painless than aptitude or yum. Probably not, true -- but you needed to get Python from somewhere didn't you? You can't see it's easy to compile that on Windows! > There is also the social problem that not everyone is permitted to > arbitrarily install software. I work for the Federal Government -- believe me, I know. There's Google App Engine, and things like that too, to support your point.... > complete code audit. (Imagine auditing all of numpy.) well, the more we add to Pyton's stdlib, the bigger an issue that will be for all Ptyon users -- antoher reason to be cautios. But at the end, I don't think there is a lot you can do with pyton without installing some third-party package? How many people do all their code development in IDLE? al their GUI's with tk? no image processing , writing their own web framework from scratch? The list goes on and on. I may have a few simple text processing scripts that don't use any third party packages, but nothing major. I teach Intro to Python, and while I could probably get away with only the stdlib for the intro class (but sure as heck not the web development class), I don't -- because there is a lot folks should know about do anything real in Python. So as much of a pain as it can be to use third-party packages, we can't put everything in the stdlib for that reason. > There are many, many people in a situation where the Python std lib is > approved, usually because it comes from a vendor with a support contract > (say, RedHat, Ubuntu, or Suse), but getting third-party packages like > numpy approved is next to impossible. don't all three of those ship numpy? I haven't used them in ages. > > to build python extensions (granted, not a given on Windows and Mac, but > > pretty likely on Linux) > > Oh, well that's okay then -- that's three, maybe four percent of the > computing world taken care of! Problem solved! hence the binaries.... really -- the "I can't install an unapproved package" is a show-stopper. "I can't built it" isn't. > > anyone building their own stuff on Linux is used to that. > > Do you realise that not all Python programmers are used to, or able to, > > "build their own stuff on Linux"? then why not "yum install numpy"? or whatever? > > All that being said -- if you do decide to do this, please use a PEP > > 3118 (enhanced buffer) supporting data type (probably array.array) -- > > compatibility with numpy and other packages for crunching numbers is > > very nice. > > py> import array > py> data = array.array('f', range(1000)) > py> import statistics > py> statistics.mean(data) > 499.5 I realized this after posting -- that is a nice feature, and could help a lot -- hurray for the buffer protocol! This makes room for compiled optimization down the road, and then you might be able to use your code with numpy arrays efficiently. > If the data type supports the sequence protocol, it should work with my > module. If it fails to work, submit a bug report, and I will fix it. fair enough. > Like the decimal module, it will probably remain pure-Python for a few > releases, but I hope that in the future the statistics module will gain a > C-accelerated version. (Or Java-accelerated for Jython, etc.) a perfectly reasonable development path. I expect > that PyPy won't need one. But because it's not really aimed at number- > crunching megabytes of data, speed is not the priority. I thought one of the key points of PyPy was performance? But anyway, maybe RPython and the JIT will take care of that. Anyway, this looks like a great project -- not so sure about putting it in the stdlib, and do hope you'll keep the number crunchers in mind, but great stuff none the less. -Chris -- http://mail.python.org/mailman/listinfo/python-list