Jon Harrop wrote: > >> In particular, I think you are eagerly > >> allocating arrays when, in a functional language, you could just as > >> easily compose closures. > > > > You are completely wrong. > > I'll give an example. If you write the Python: > > a[:] = b[:] + c[:] + d[:] > > I think that is equivalent to the ML: > > fill a (map2 ( + ) (map2 ( + ) b c) d) > > which can be deforested in ML to avoid the creation of the intermediate > result b[:] + c[:] by using a closure to add three values at once: > > fill a (map3 (fun b c d -> b + c + d) b c d) > > which will be much faster because it doesn't generate an intermediate array.
Ah, but, this wasn't about temporaries when you spoke of "eagerly allocating arrays", was it? But yes, you're right, in this example temporary arrays are created; arrays that Ocaml would not need. (I don't exactly understand why you'd need a functional language to get this optimization, though.) You might have guessed that you can get rid of most temporary arrays, at the expense of readability, with numpy. For example, replacing d1[:] = odd[:] - C1*even[:] with numpy.multiply(-C1,even,d1) numpy.add(d1,odd,d1) eliminates the intermediate. No unnecessary array allocations; no closures necessary. I'm curious whether this shared-slicing isn't, in some ways, advantageous over the Ocaml way. I presume that in Ocaml, the way you'd "share" array data is to create a closure can apply some operation to selected elements of the array, returning the result. But could this as easily modify the array in-place? I presume a good compiler could be able to inline the function and optimize the call to an in-place operation, but I'm not sure how well this works in practice. Here's a version of D4_Transform that uses no temporary arrays (aside from the work arrays s1, d1, and d2, which are allocated only once). It's about 40% faster for me than the one with infix operations. I'd be curious how it compares to a correct Ocaml version. (I'd still expect Ocaml to be at least twice as fast.) def alt_D4_Transform(x, s1=None, d1=None, d2=None): add = numpy.add multiply = numpy.multiply C1 = 1.7320508075688772 C2 = 0.4330127018922193 C3 = -0.066987298107780702 C4 = 0.51763809020504137 C5 = 1.9318516525781364 if d1 == None: d1 = numpy.zeros(x.size/2,numpy.Float) s1 = numpy.zeros(x.size/2,numpy.Float) d2 = numpy.zeros(x.size/2,numpy.Float) odd = x[1::2] even = x[:-1:2] multiply(-C1,even,d1) add(d1,odd,d1) multiply(C2,d1,s1) add(s1,even,s1) d2[0] = C3 * d1[-1] multiply(C3,d1[:-1],d2[1:]) add(s1,d2,s1) d2[0] = d1[0] + s1[-1] add(d1[1:],s1[:-1],d2[1:]) multiply(C4,s1,even) multiply(C5,d2,odd) if x.size > 2: alt_D4_Transform(even,s1[0:even.size/2],d1[0:even.size/2],d2[0:even.size/2]) > > It seems to > > me a big help is the ability to fold multiple array operations into a > > single loop, which is optimization a dynamically-typed language like > > Python can't easily make. (It'd require are really smart JIT compiler > > or some concessions in dynamicity.) > > Writing a JIT to compile this kind of stuff is easy. Eh, getting a JIT to do the optimization I spoke of in Python is not easy. It would be relatively easy in a statically typed language. In Python, it'd be tantamount to rewriting a dynamically reconfigurable version of numpy--you'd have to duplicate numpy's complex type-dispatching rules. > My point is that this > is fundamentally bad code, Whoa, there. I realize that for people who prefer functional programming, with their recursively reductionist way of thinking, it might seem as if leaving any sort of reducibility in final result is "fundamentally bad", but that's a pretty strong thing to say. > so why bother trying to write a Python JIT? Why > not just write in a better language for this task? Optimising within a > fundamentally slow language seems silly to me. Because, for most people, language choice is not a greedy maximizing of a single issue. Nobody who uses numpy is under the impression that it can match a statically-typed language that is compiled to machine code in peak performance. (Matlab is fair game, of course.) But speed is not the only important thing. I use Python because it's a excellently designed language that fits my overall needs, and numpy because sometimes I need more speed than vanilla Python. Only when speed is critical (for example, 3D collision detection) do I write an extension in a "better language for the task" (i.e. C). This is something that's quite popular in the numerically-intensive computing community, BTW. Many people use Python to handle boring stuff like file I/O, memory managment, and non-critical numerical calculations, and write C or Fortran extensions to do the numerically-intensive stuff. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list