Re: why the big difference in speed?

Ranjit Wed, 22 Sep 2010 06:05:37 -0700

I was missing type hints on the inner calls to aget in the sum, and
changing from aset-double to aset makes it even faster:


(defn sum-fields4 [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
  (let [L (int (alength arr1))]
    (dotimes [i L]
      (dotimes [j L]
        (aset ^doubles (aget result i) j
                (+ (aget ^doubles (aget arr1 i) j) (aget ^doubles
(aget arr2 i) j)))))))

It doesn't look like the casts to double aren't necessary though. On
my computer on a 1000x1000 array this takes ~30 msec!

But strangely replacing the call to aset-double with aset in our
fastest gaussian-matrix slows things down a lot. Everything else seems
to make sense though.




On Sep 21, 11:40 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
> This one is more than 10x faster
>
> (defn sum-fields3 [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
>   (let [L (int (alength arr1))]
>     (dotimes [i L]
>       (let [^doubles a1 (aget arr1 i)
>             ^doubles a2 (aget arr2 i)
>             ^doubles r  (aget result i)]
>         (dotimes [j L]
>           (aset-double r j (+ (double (aget a1 j)) (double (aget a2
> j)))))))))
>
> Not sure what the problem was exactly with yours.  Maybe the multi-
> indices versions of aget.
>
> On Sep 21, 7:43 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > I was thinking I needed the Java arrays for interop. At one step in my
> > simulation I need to take an FFT of a 2d array, then multiply that
> > array with another array, and then inverse FFT. Java array operations
> > in Clojure seem so slow though that it will actually be better to
> > convert from either a built in data structure or an incanter matrix to
> > an array for the fft, then back for the multiplication, then back for
> > the IFFT, and then back.
>
> > The incanter matrices seem like a pretty good choice. This function
> > using incanter is about as fast as the imperative type hinted function
> > you came up with:
>
> > (defn gaussian-matrix [L mean std] (matrix (map #(sample-normal %1
> > mean std) (repeat L L))))
>
> > ,and adding matrices is fast. Converting from an incanter matrix to an
> > array like this:
>
> > (def B (into-array (map double-array (to-list A))))
>
> > takes ~100 msec, and converting back takes a similar amount of time.
> > So unfortunately all the converting back and forth means almost a half
> > second extra per loop.
>
> > Not that I want to use this approach anymore, but what other type
> > hints could I add to my Java array addition function? The only
> > unhinted variables I see left are the indices.
>
> > Thanks for all your help,
>
> > -Ranjit
>
> > On Sep 21, 5:48 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > I think you're still missing some type hints.  I think there are
> > > varying degrees of reflective code the compiler can emit, and it
> > > doesn't always warn for the intermediate cases.
>
> > > Do you need to do all this array business for Java interop, or just
> > > because you believe it will give you maximum performance?  If the
> > > latter, I'd recommend trying it with built-in data structures first --
> > > you might be surprised at how good the performance can be.  Or, if you
> > > want to do lots of matrix-and-vector stuff, perhaps try out the
> > > Incanter matrix library or similar Java libraries.
>
> > > I find writing correctly-hinted Clojure code for dealing with arrays
> > > to be clumsier and more difficult than just writing the equivalent
> > > Java (one of very few areas where this is the case), but fortunately I
> > > only very rarely find the need to do so.  But, if that's really what
> > > you want to do, I think you should always be able to get speed on par
> > > with raw Java.
>
> > > -Jason
>
> > > On Sep 21, 1:04 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > Thanks, I just tried using the random number generator in Incanter as
> > > > a replacement and it shaves another ~100 msec off the runtime of the
> > > > function. That's better but still noticeably slower than python.
>
> > > > I also tried applying what I've learned here to writing a function
> > > > that adds two arrays which is imperative and I used type hints to get
> > > > rid of the reflection warnings:
>
> > > > (defn sum-fields [^"[[D" arr1 ^"[[D" arr2 ^"[[D" result]
> > > >   (let [L (alength arr1)]
> > > >     (dotimes [i L]
> > > >       (dotimes [j L]
> > > >         (aset-double ^doubles (aget result i) j (+ (aget arr1 i j)
> > > >                              (aget arr2 i j)))))))
>
> > > > but it's surprisingly slow compared to numpy again.
>
> > > > On Sep 21, 2:26 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > FYI I fired up a profiler and more than 2/3 of the runtime of 
> > > > > gaussian-
> > > > > matrix5 is going to the .nextGaussian method of java.util.Random.  I
> > > > > think there are faster (and higher-quality) drop-in replacements for
> > > > > java.util.Random, if that is really important to you.  I seem to
> > > > > recall there being a good Mersenne Twister implementation around,
> > > > > which might fit your bill.
>
> > > > > -Jason
>
> > > > > On Sep 21, 6:34 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > Yeah, I spoke too soon. All the rows are identical. This is what I
> > > > > > meant to do:
>
> > > > > > (defn gaussian-matrix-for-the-nth-time [L]
> > > > > >   (into-array (map double-array (repeatedly L #(repeatedly L next-
> > > > > > gaussian)))))
>
> > > > > > and on my computer takes ~800 msecs vs. ~400 msecs for gaussian-
> > > > > > matrix5 to make an array of 1000^2 gaussians.
>
> > > > > > But numpy only takes about 100 msecs to do the same thing on my
> > > > > > computer. I'm surprised we can't beat that or at least get close. 
> > > > > > But
> > > > > > maybe next-gaussian is the bottleneck as you say.
>
> > > > > > On Sep 21, 12:20 am, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > On Sep 20, 4:43 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > I'm glad you think partition is the problem, because that was 
> > > > > > > > my guess
> > > > > > > > too. But I think I have the answer. This is the fastest version 
> > > > > > > > I've
> > > > > > > > seen so far:
>
> > > > > > > > (defn gaussian-matrix-final [L]
> > > > > > > >   (into-array ^doubles (map double-array (repeat L (repeatedly 
> > > > > > > > L next-
> > > > > > > > gaussian)))))
>
> > > > > > > The ^doubles type hint is wrong (it means array of doubles, not 
> > > > > > > seq of
> > > > > > > arrays of doubles); the compiler is just ignoring it.
>
> > > > > > > And the reason it's so fast is probably since you're repeating the
> > > > > > > same L gaussian values L times here.  Doesn't matter how fast it 
> > > > > > > runs
> > > > > > > if it returns the wrong answer :-).   Anyway, that might indicate 
> > > > > > > that
> > > > > > > the .nextGaussian is actually the bottleneck in the fastest 
> > > > > > > versions.
>
> > > > > > > -Jason
>
> > > > > > > > If I understand what's going on now, then it looks like the 
> > > > > > > > only way
> > > > > > > > to make this any faster is if next-gaussian could return 
> > > > > > > > primitives.
>
> > > > > > > > The for, and doseq macros seems like they're pretty slow.
>
> > > > > > > > -Ranjit
>
> > > > > > > > On Sep 20, 3:30 pm, Jason Wolfe <jawo...@berkeley.edu> wrote:
>
> > > > > > > > > I think partition is slowing you down (but haven't profiled to
> > > > > > > > > verify).  Here's a functional version that's about 70% as 
> > > > > > > > > fast as my
> > > > > > > > > "5":
>
> > > > > > > > > (defn gaussian-matrix6 [L]
> > > > > > > > >      (to-array (for [i (range L)] (into-array Double/TYPE 
> > > > > > > > > (for [j
> > > > > > > > > (range L)] (next-gaussian))))))
>
> > > > > > > > > and I'd guess that's about as good as you're going to get, 
> > > > > > > > > given that
> > > > > > > > > this approach is necessarily going to box and unbox the 
> > > > > > > > > doubles, and
> > > > > > > > > create intermediate sequences, rather than stuffing the 
> > > > > > > > > primitive
> > > > > > > > > doubles directly into the result array.
>
> > > > > > > > > -Jason
>
> > > > > > > > > On Sep 20, 12:00 pm, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > > > Replacing the doseq's with dotimes speeds it up a little 
> > > > > > > > > > more:
>
> > > > > > > > > > (defn gaussian-matrix5 [^"[[D" arr]
> > > > > > > > > >   (dotimes [x (alength arr)]
> > > > > > > > > >     (dotimes [y (alength (first arr))]
> > > > > > > > > >       (aset-double ^doubles (aget arr (int x)) (int y) 
> > > > > > > > > > (next-
> > > > > > > > > > gaussian)))))
>
> > > > > > > > > > but I'm getting reflection warnings on alength. I guess it 
> > > > > > > > > > doesn't
> > > > > > > > > > cause a problem because they're only called once?
>
> > > > > > > > > > Also adding type hints to the more functional version of my 
> > > > > > > > > > first
> > > > > > > > > > attempt speeds things up quite a bit:
>
> > > > > > > > > > (defn gaussian-matrix2 [L]
> > > > > > > > > >      (into-array ^doubles
> > > > > > > > > >           (map double-array (partition L (repeatedly (* L 
> > > > > > > > > > L) next-
> > > > > > > > > > gaussian)))))
>
> > > > > > > > > > But it's still about 4x slower than gaussian-matrix5 above. 
> > > > > > > > > > There must
> > > > > > > > > > be a way to improve on the inner loop here that doesn't 
> > > > > > > > > > require using
> > > > > > > > > > indices, right?
>
> > > > > > > > > > On Sep 20, 12:32 pm, Jason Wolfe <jawo...@berkeley.edu> 
> > > > > > > > > > wrote:
>
> > > > > > > > > > > Oops, I found aset-double2 with tab completion and 
> > > > > > > > > > > figured it was
> > > > > > > > > > > build-in.  Forgot it was a utility I built some time ago, 
> > > > > > > > > > > a stub for a
> > > > > > > > > > > Java method that does the setting.
>
> > > > > > > > > > > Also, I got the type hint for the "arr" arg wrong, 
> > > > > > > > > > > although it didn't
> > > > > > > > > > > seem to matter.
>
> > > > > > > > > > > Here's a fixed version in standard Clojure that's 
> > > > > > > > > > > basically as fast:
>
> > > > > > > > > > > user> (defn gaussian-matrix4 [^"[[D" arr ^int L]
> > > > > > > > > > >             (doseq [x (range L) y (range L)] (aset-double 
> > > > > > > > > > > ^doubles
> > > > > > > > > > > (aget arr (int x)) (int y) (.nextGaussian ^Random r))))
> > > > > > > > > > > #'user/gaussian-matrix4
> > > > > > > > > > > user> (do   (microbench (gaussian-matrix3 (make-array 
> > > > > > > > > > > Double/TYPE 10
> > > > > > > > > > > 10) 10)) (microbench (gaussian-matrix4 (make-array 
> > > > > > > > > > > Double/TYPE 10 10)
> > > > > > > > > > > 10)) )
> > > > > > > > > > > min; avg; max ms:  0.000 ; 0.033 ; 8.837    ( 56828  
> > > > > > > > > > > iterations)
> > > > > > > > > > > min; avg; max ms:  0.009 ; 0.038 ; 7.132    ( 50579  
> > > > > > > > > > > iterations)
>
> > > > > > > > > > > It seems like you should be able to just use aset-double 
> > > > > > > > > > > with multiple
> > > > > > > > > > > indices (in place of aset-double2), but I can't seem to 
> > > > > > > > > > > get the type
> > > > > > > > > > > hints right.
>
> > > > > > > > > > > -Jason
>
> > > > > > > > > > > On Sep 20, 7:36 am, Ranjit <rjcha...@gmail.com> wrote:
>
> > > > > > > > > > > > Thanks Jason, this is great.
>
> > > > > > > > > > > > I was confused earlier because I wasn't seeing 
> > > > > > > > > > > > reflection warnings,
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Re: why the big difference in speed?

Reply via email to