On 04/10/17 22:47, Fabien wrote: > On 10/04/2017 10:11 PM, Thomas Jollans wrote: >> Be warned, pandas is part of the scientific python stack, which is >> immensely powerful and popular, but it does have a distinctive style >> that may appear cryptic if you're used to the way the rest of the world >> writes Python. > > Can you elaborate on this one? As a scientist, I am curious ;-)
Sure. Python is GREAT at iterating. Generators are everywhere. Everyone loves for loops. List comprehensions and generator expressions are star features. filter and map are builtins. reduce used be a builtin, even though almost nobody really understood what it did. In [1]: import numpy as np In the world of numpy (and the greater scientific stack), you don't iterate. You don't write for loops. You have a million floats in memory that you want to do math on - you don't want to wait for ten million calls to __class__.__dict__['__getattr__']('__add__').__call__() or whatever to run. In numpy land, numpy writes your loops for you. In FORTRAN. (well ... probably C) As I see it the main cultural difference between "traditional" Python and numpy-Python is that numpy implicitly iterates over arrays all the time. Python never implicitly iterates. Python is not MATLAB. In [2]: np.array([1, 2, 3]) + np.array([-3, -2, -1]) Out[2]: array([-2, 0, 2]) In [3]: [1, 2, 3] + [-3, -2, -1] Out[3]: [1, 2, 3, -3, -2, -1] In numpy, operators don't mean what you think they mean. In [4]: a = (np.random.rand(30) * 10).astype(np.int64) In [5]: a Out[5]: array([6, 1, 6, 9, 1, 0, 3, 5, 8, 5, 2, 6, 1, 1, 2, 2, 4, 2, 4, 2, 5, 3, 7, 8, 2, 5, 8, 1, 0, 8]) In [6]: a > 5 Out[6]: array([ True, False, True, True, False, False, False, False, True, False, False, True, False, False, False, False, False, False, False, False, False, False, True, True, False, False, True, False, False, True], dtype=bool) In [7]: list(a) > 5 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-0c10c9961870> in <module>() ----> 1 list(a) > 5 TypeError: unorderable types: list() > int() Suddenly, you can even compare sequences and scalars! And > no longer gives you a bool! Madness! Now, none of this, so far, has been ALL THAT cryptic as far as I can tell. It's when you do more complicated things, and start combining different parts of the numpy toolbox, that it becomes clear that numpy-Python is kind of a different language. In [8]: a[(np.sqrt(a).astype(int)**2 == a) & (a < 5)] Out[8]: array([1, 1, 0, 1, 1, 4, 4, 1, 0]) In [9]: import math In [10]: [i for i in a if int(math.sqrt(i))**2 == i and i < 5] Out[10]: [1, 1, 0, 1, 1, 4, 4, 1, 0] Look at my pandas example from my previous post. If you're a Python-using scientist, even if you're not very familiar with pandas, you'll probably be able to see more or less how it works. I imagine that there are plenty of experienced Pythonistas on this list who never need to deal with large amounts of numeric data that are completely nonplussed by it, and I wouldn't blame them. The style and the idiosyncrasies of array-heavy scientific Python and stream or iterator-heavy scripting and networking Python are just sometimes rather different. Cheers Thomas -- https://mail.python.org/mailman/listinfo/python-list