On Wed, Apr 16, 2008 at 2:54 PM, Gabriel Genellina <[EMAIL PROTECTED]> wrote: > En Wed, 16 Apr 2008 10:36:14 -0300, Jonathan Shao <[EMAIL PROTECTED]> > escribió: > > *Gabriel Genellina* gagsl-py2 at yahoo.com.ar > > > <python-list%40python.org?Subject=Interesting%20timing%20issue%20I%20noticed&In-Reply-To=> > > *Wed Apr 16 08:44:10 CEST 2008* > > >> Another thing would be to rearrange the loops so the outer one executes > > less times; if you know that borderX<<sizeX and borderY<<sizeY it may be > > better to swap the inner and outer loops above. > > Thank you for the tip on xrange. > > Even if I swap the inner and outer loops, I would still be doing the same > > number of computations, am I right (since I still need to go through the > > same number of elements)? I'm not seeing how a loop swap would lead to > > fewer > > computations, since I still need to calculate the outer rim of elements > > in > > the array (defined by borderX and borderY). > > You minimize the number of "for" statements executed, and the number of > xrange objects created. Both take some time in Python. > > <code> > import timeit > > f1 = """ > for i in xrange(10): > for j in xrange(1000): > i,j > """ > > f2 = """ > for j in xrange(1000): > for i in xrange(10): > i,j > """ > > print timeit.Timer(f1).repeat(number=1000) > print timeit.Timer(f2).repeat(number=1000) > </code> > > Output: > [2.0405478908632233, 2.041863979919242, 2.0397852240997167] > [2.8623411634718821, 2.8330055914927783, 2.8361752680857535] > > The second (using the largest outer loop) is almost 40% slower than the > first one. "Simplified" Big-Oh complexity analysis isn't enough in cases > like this. > > -- > Gabriel Genellina > > -- > http://mail.python.org/mailman/listinfo/python-list >
For large multidimensional arrays you may also see speed differences depending on traversal order due to cache effects. For instance, if the arrays are stored row-major, then processing an array a row at a time means you're getting a bunch of memory accesses contiguous in memory (so the cache loading a line at a time means you'll have several hits per one load), while accessing it by column means you'll probably have to go out to memory a lot (depending on whether the hardware has a prefetcher or how good it is, I suppose). Just something to consider. -- http://mail.python.org/mailman/listinfo/python-list