Re: why the big difference in speed?

Jason Wolfe Tue, 21 Sep 2010 20:41:05 -0700

This one is more than 10x faster

(defn sum-fields3 [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
  (let [L (int (alength arr1))]
    (dotimes [i L]
      (let [^doubles a1 (aget arr1 i)
            ^doubles a2 (aget arr2 i)
            ^doubles r  (aget result i)]
        (dotimes [j L]
          (aset-double r j (+ (double (aget a1 j)) (double (aget a2
j)))))))))


Not sure what the problem was exactly with yours.  Maybe the multi-
indices versions of aget.

On Sep 21, 7:43 pm, Ranjit <rjcha...@gmail.com> wrote:
> I was thinking I needed the Java arrays for interop. At one step in my
> simulation I need to take an FFT of a 2d array, then multiply that
> array with another array, and then inverse FFT. Java array operations
> in Clojure seem so slow though that it will actually be better to
> convert from either a built in data structure or an incanter matrix to
> an array for the fft, then back for the multiplication, then back for
> the IFFT, and then back.
>
> The incanter matrices seem like a pretty good choice. This function
> using incanter is about as fast as the imperative type hinted function
> you came up with:
>
> (defn gaussian-matrix [L mean std] (matrix (map #(sample-normal %1
> mean std) (repeat L L))))
>
> ,and adding matrices is fast. Converting from an incanter matrix to an
> array like this:
>
> (def B (into-array (map double-array (to-list A))))
>
> takes ~100 msec, and converting back takes a similar amount of time.
> So unfortunately all the converting back and forth means almost a half
> second extra per loop.
>
> Not that I want to use this approach anymore, but what other type
> hints could I add to my Java array addition function? The only
> unhinted variables I see left are the indices.
>
> Thanks for all your help,
>
> -Ranjit
>
> On Sep 21, 5:48 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > I think you're still missing some type hints.  I think there are
> > varying degrees of reflective code the compiler can emit, and it
> > doesn't always warn for the intermediate cases.
>
> > Do you need to do all this array business for Java interop, or just
> > because you believe it will give you maximum performance?  If the
> > latter, I'd recommend trying it with built-in data structures first --
> > you might be surprised at how good the performance can be.  Or, if you
> > want to do lots of matrix-and-vector stuff, perhaps try out the
> > Incanter matrix library or similar Java libraries.
>
> > I find writing correctly-hinted Clojure code for dealing with arrays
> > to be clumsier and more difficult than just writing the equivalent
> > Java (one of very few areas where this is the case), but fortunately I
> > only very rarely find the need to do so.  But, if that's really what
> > you want to do, I think you should always be able to get speed on par
> > with raw Java.
>
> > -Jason
>
> > On Sep 21, 1:04 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > Thanks, I just tried using the random number generator in Incanter as
> > > a replacement and it shaves another ~100 msec off the runtime of the
> > > function. That's better but still noticeably slower than python.
>
> > > I also tried applying what I've learned here to writing a function
> > > that adds two arrays which is imperative and I used type hints to get
> > > rid of the reflection warnings:
>
> > > (defn sum-fields [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
> > >   (let [L (alength arr1)]
> > >     (dotimes [i L]
> > >       (dotimes [j L]
> > >         (aset-double ^doubles (aget result i) j (+ (aget arr1 i j)
> > >                              (aget arr2 i j)))))))
>
> > > but it's surprisingly slow compared to numpy again.
>
> > > On Sep 21, 2:26 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > FYI I fired up a profiler and more than 2/3 of the runtime of gaussian-
> > > > matrix5 is going to the .nextGaussian method of java.util.Random.  I
> > > > think there are faster (and higher-quality) drop-in replacements for
> > > > java.util.Random, if that is really important to you.  I seem to
> > > > recall there being a good Mersenne Twister implementation around,
> > > > which might fit your bill.
>
> > > > -Jason
>
> > > > On Sep 21, 6:34 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > Yeah, I spoke too soon. All the rows are identical. This is what I
> > > > > meant to do:
>
> > > > > (defn gaussian-matrix-for-the-nth-time [L]
> > > > >   (into-array (map double-array (repeatedly L #(repeatedly L next-
> > > > > gaussian)))))
>
> > > > > and on my computer takes ~800 msecs vs. ~400 msecs for gaussian-
> > > > > matrix5 to make an array of 1000^2 gaussians.
>
> > > > > But numpy only takes about 100 msecs to do the same thing on my
> > > > > computer. I'm surprised we can't beat that or at least get close. But
> > > > > maybe next-gaussian is the bottleneck as you say.
>
> > > > > On Sep 21, 12:20 am, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > On Sep 20, 4:43 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > I'm glad you think partition is the problem, because that was my 
> > > > > > > guess
> > > > > > > too. But I think I have the answer. This is the fastest version 
> > > > > > > I've
> > > > > > > seen so far:
>
> > > > > > > (defn gaussian-matrix-final [L]
> > > > > > >   (into-array ^doubles (map double-array (repeat L (repeatedly L 
> > > > > > > next-
> > > > > > > gaussian)))))
>
> > > > > > The ^doubles type hint is wrong (it means array of doubles, not seq 
> > > > > > of
> > > > > > arrays of doubles); the compiler is just ignoring it.
>
> > > > > > And the reason it's so fast is probably since you're repeating the
> > > > > > same L gaussian values L times here.  Doesn't matter how fast it 
> > > > > > runs
> > > > > > if it returns the wrong answer :-).   Anyway, that might indicate 
> > > > > > that
> > > > > > the .nextGaussian is actually the bottleneck in the fastest 
> > > > > > versions.
>
> > > > > > -Jason
>
> > > > > > > If I understand what's going on now, then it looks like the only 
> > > > > > > way
> > > > > > > to make this any faster is if next-gaussian could return 
> > > > > > > primitives.
>
> > > > > > > The for, and doseq macros seems like they're pretty slow.
>
> > > > > > > -Ranjit
>
> > > > > > > On Sep 20, 3:30 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > > I think partition is slowing you down (but haven't profiled to
> > > > > > > > verify).  Here's a functional version that's about 70% as fast 
> > > > > > > > as my
> > > > > > > > "5":
>
> > > > > > > > (defn gaussian-matrix6 [L]
> > > > > > > >      (to-array (for [i (range L)] (into-array Double/TYPE (for 
> > > > > > > > [j
> > > > > > > > (range L)] (next-gaussian))))))
>
> > > > > > > > and I'd guess that's about as good as you're going to get, 
> > > > > > > > given that
> > > > > > > > this approach is necessarily going to box and unbox the 
> > > > > > > > doubles, and
> > > > > > > > create intermediate sequences, rather than stuffing the 
> > > > > > > > primitive
> > > > > > > > doubles directly into the result array.
>
> > > > > > > > -Jason
>
> > > > > > > > On Sep 20, 12:00 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > > Replacing the doseq's with dotimes speeds it up a little more:
>
> > > > > > > > > (defn gaussian-matrix5 [^"[[D" arr]
> > > > > > > > >   (dotimes [x (alength arr)]
> > > > > > > > >     (dotimes [y (alength (first arr))]
> > > > > > > > >       (aset-double ^doubles (aget arr (int x)) (int y) (next-
> > > > > > > > > gaussian)))))
>
> > > > > > > > > but I'm getting reflection warnings on alength. I guess it 
> > > > > > > > > doesn't
> > > > > > > > > cause a problem because they're only called once?
>
> > > > > > > > > Also adding type hints to the more functional version of my 
> > > > > > > > > first
> > > > > > > > > attempt speeds things up quite a bit:
>
> > > > > > > > > (defn gaussian-matrix2 [L]
> > > > > > > > >      (into-array ^doubles
> > > > > > > > >           (map double-array (partition L (repeatedly (* L L) 
> > > > > > > > > next-
> > > > > > > > > gaussian)))))
>
> > > > > > > > > But it's still about 4x slower than gaussian-matrix5 above. 
> > > > > > > > > There must
> > > > > > > > > be a way to improve on the inner loop here that doesn't 
> > > > > > > > > require using
> > > > > > > > > indices, right?
>
> > > > > > > > > On Sep 20, 12:32 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > > > > Oops, I found aset-double2 with tab completion and figured 
> > > > > > > > > > it was
> > > > > > > > > > build-in.  Forgot it was a utility I built some time ago, a 
> > > > > > > > > > stub for a
> > > > > > > > > > Java method that does the setting.
>
> > > > > > > > > > Also, I got the type hint for the "arr" arg wrong, although 
> > > > > > > > > > it didn't
> > > > > > > > > > seem to matter.
>
> > > > > > > > > > Here's a fixed version in standard Clojure that's basically 
> > > > > > > > > > as fast:
>
> > > > > > > > > > user> (defn gaussian-matrix4 [^"[[D" arr ^int L]
> > > > > > > > > >             (doseq [x (range L) y (range L)] (aset-double 
> > > > > > > > > > ^doubles
> > > > > > > > > > (aget arr (int x)) (int y) (.nextGaussian ^Random r))))
> > > > > > > > > > #'user/gaussian-matrix4
> > > > > > > > > > user> (do   (microbench (gaussian-matrix3 (make-array 
> > > > > > > > > > Double/TYPE 10
> > > > > > > > > > 10) 10)) (microbench (gaussian-matrix4 (make-array 
> > > > > > > > > > Double/TYPE 10 10)
> > > > > > > > > > 10)) )
> > > > > > > > > > min; avg; max ms:  0.000 ; 0.033 ; 8.837    ( 56828  
> > > > > > > > > > iterations)
> > > > > > > > > > min; avg; max ms:  0.009 ; 0.038 ; 7.132    ( 50579  
> > > > > > > > > > iterations)
>
> > > > > > > > > > It seems like you should be able to just use aset-double 
> > > > > > > > > > with multiple
> > > > > > > > > > indices (in place of aset-double2), but I can't seem to get 
> > > > > > > > > > the type
> > > > > > > > > > hints right.
>
> > > > > > > > > > -Jason
>
> > > > > > > > > > On Sep 20, 7:36 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > > > > Thanks Jason, this is great.
>
> > > > > > > > > > > I was confused earlier because I wasn't seeing reflection 
> > > > > > > > > > > warnings,
> > > > > > > > > > > but it turns out that was only because I was evaluating 
> > > > > > > > > > > the function
> > > > > > > > > > > definitions in the emacs buffer, and the warnings weren't 
> > > > > > > > > > > visible.
>
> > > > > > > > > > > I have a question about gaussian-matrix3 though. What is 
> > > > > > > > > > > "aset-
> > > > > > > > > > > double2"? Is that a macro that has a type hint for an 
> > > > > > > > > > > array of
> > > > > > > > > > > doubles?
>
> > > > > > > > > > > Thanks,
>
> > > > > > > > > > > -Ranjit
> > > > > > > > > > > On Sep 19, 5:37 pm, Jason Wolfe <jawo...@berkeley.edu> 
> > > > > > > > > > > wrote:
>
> > > > > > > > > > > > Hi Ranjit,
>
> > > > > > > > > > > > The big perf differences you're seeing are due to 
> > > > > > > > > > > > reflective calls.
> > > > > > > > > > > > Getting the Java array bits properly type-hinted is 
> > > > > > > > > > > > especially tricky,
> > > > > > > > > > > > since you don't always get good reflection warnings.
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: why the big difference in speed?

Reply via email to