Tim Hochberg wrote: > robert wrote: >> To avoid this you'd need a type cast in Python code everywhere you get >> scalars from numpy into a python variable. Error prone task. Or >> check/re-render your whole object tree. >> Wouldn't it be much better if numpy would return Python scalars for >> float64 (maybe even for float32) and int32, int64 ... where possible? >> (as numarray and Numeric did) >> I suppose numpy knows internally very quickly how to cast. > > The short answer is no, it would not be better. There are some trade > offs involved here, but overall, always returning numpy scalars is a > significant improvement over returning Python scalars some of the time. > Which is why numpy does it that way now; it was a conscious choice, it > didn't just happen. Please search the archives of numpy-discussion for > previous discussions of this and if that is not enlightening enough > please ask at on the numpy-discussion list (the address of which just > changed and I don't have it handy, but I'm sure you can find it).
Didn't find the relevant reasoning within time. Yet guess the reason is isolated-module-centric. All further computations in python are much slower and I cannot even see a speed increase when (rare case) puting a numpy-ic scalar back into a numpy array: >>> a=array([1.,0,0,0,0]) >>> f=1.0 >>> fn=a[0] >>> type(fn) <type 'numpy.float64'> >>> timeit.Timer("f+f",glbls=globals()).timeit(10000) 0.0048265910890909324 >>> timeit.Timer("f+f",glbls=globals()).timeit(100000) 0.045992158221226376 >>> timeit.Timer("fn+fn",glbls=globals()).timeit(100000) 0.14901307289054877 >>> timeit.Timer("a[1]=f",glbls=globals()).timeit(100000) 0.060825607723899111 >>> timeit.Timer("a[1]=fn",glbls=globals()).timeit(100000) 0.059519575812004177 >>> timeit.Timer("x=a[0]",glbls=globals()).timeit(100000) 0.12302317752676117 >>> timeit.Timer("x=float(a[0])",glbls=globals()).timeit(100000) 0.31556273213496411 creation of numpy scalar objects seems not be cheap/advantagous anyway: >>> oa=array([1.0,1.0,1.0,1.0,1],numpy.object) >>> oa array([1.0, 1.0, 1.0, 1.0, 1], dtype=object) >>> timeit.Timer("x=a[0]",glbls=globals()).timeit(100000) 0.12025438987348025 >>> timeit.Timer("x=oa[0]",glbls=globals()).timeit(100000) 0.050609225474090636 >>> timeit.Timer("a+a",glbls=globals()).timeit(100000) 1.3081539692893784 >>> timeit.Timer("oa+oa",glbls=globals()).timeit(100000) 1.5201345422392478 > For your particular issue, you might try tweaking pickle to convert > int64 objects to int objects. Assuming of course that you have enough of > these to matter, otherwise, I suggest just leaving things alone. ( int64've not had so far don't know whats with python L's ) the main problem is with hundreds of all-day normal floats (now numpy.float64) and ints (numpy.int32) variables. Speed issues, memory consumption... And a pickled tree cannot be read by an app which has not numpy available. and the pickles are very big. I still really wonder how all this observations and the things which I can imagine so far can sum up to an overall advantage for letting numpy.float64 & numpy.int32 scalars out by default - and also possibly not for numpy.float32 which has somewhat importance in practice ? Letting out nan and inf.. objects and offering an explicit type case is of course ok. Robert -- http://mail.python.org/mailman/listinfo/python-list