Hey Mikera, I did look at core.matrix awhile ago, but I'll take another look.
Right now, flop is just trying to make it easy to write *arbitrary* array operations compactly, while minimizing the chance of getting worse-than-Java performance. This used to be very tricky to get right when flop was developed (against Clojure 1.2); the situation has clearly improved since then, but there still seem to be some subtleties in going fast with arrays in 1.5.1 that we are trying to understand and then automate. As I understand it, core.matrix has a much more ambitious goal of abstracting over all matrix types. This is a great goal, but I'm not sure if the protocol-based implementation can give users any help writing new core operations efficiently (say, making a new array with c[i] = a[i] + b[i]^2 / 2) -- unless there's some clever way of combining protocols with macros (hmmm). I just benchmarked core.matrix/esum, and on my machine in Clojure 1.5.1 it's 2.69x slower than the Java version above, and 1.62x slower than our current best Clojure version. A collaboration sounds interesting -- although right now it seems like our goals are somewhat orthogonal. What do you think? -Jason On Fri, Jun 14, 2013 at 2:58 AM, Mikera <mike.r.anderson...@gmail.com> wrote: > Hi Jason, > > Have you guys taken a look at core.matrix for any of this stuff? We're also > shooting for near-Java-parity for all of the core operations on large double > arrays. > > (use 'clojure.core.matrix) > (require '[criterium.core :as c]) > > (let [a (double-array (range 10000))] > (c/quick-bench (esum a))) > > WARNING: Final GC required 69.30384798936066 % of runtime > Evaluation count : 45924 in 6 samples of 7654 calls. > Execution time mean : 12.967112 µs > Execution time std-deviation : 326.480900 ns > Execution time lower quantile : 12.629252 µs ( 2.5%) > Execution time upper quantile : 13.348527 µs (97.5%) > Overhead used : 3.622005 ns > > All the core.matrix functions get dispatched via protocols, so they work on > any kind of multi-dimensional matrix (not just Java arrays). This adds a > tiny amount of overhead (about 10-15ns), but it is negligible when dealing > with medium-to-large vectors/matrices/arrays. > > I'm interested in feedback and hopefully we can collaborate: I'm keen to get > the best optimised numerical functions we can in Clojure. Also, I think you > may find the core.matrix facilities very helpful when moving to higher level > abstractions (i.e. 2D matrices and higher order multi-dimensional arrays) > > > On Thursday, 13 June 2013 21:50:48 UTC+1, Jason Wolfe wrote: >> >> Taking a step back, the core problem we're trying to solve is just to sum >> an array's values as quickly as in Java. (We really want to write a fancier >> macro that allows arbitrary computations beyond summing that can't be >> achieved by just calling into Java, but this simpler task gets at the crux >> of our performance issues). >> >> This Java code: >> >> public static double asum_noop_indexed(double[] arr) { >> double s = 0; >> for (int i = 0; i < arr.length; i++) { >> s += arr[i]; >> } >> return s; >> } >> >> can run on an array with 10k elements in about 8 microseconds. In >> contrast, this Clojure code (which I believe used to be as fast as the Java >> in a previous Clojure version): >> >> (defn asum-identity [^doubles a] >> (let [len (long (alength a))] >> (loop [sum 0.0 >> idx 0] >> (if (< idx len) >> (let [ai (aget a idx)] >> (recur (+ sum ai) (unchecked-inc idx))) >> sum)))) >> >> executes on the same array in about 40 microseconds normally, or 14 >> microseconds with *unchecked-math* set to true. (We weren't using >> unchecked-math properly until today, which is why we were doing the hacky >> interface stuff above, please disregard that -- but I think the core point >> about an extra cast is still correct). >> >> For reference, (areduce a1 i r 0.0 (+ (aget a1 i) r)) takes about 23 ms to >> do the same computation (with unchecked-math true). >> >> Does anyone have ideas for how to achieve parity with Java on this task? >> They'd be much appreciated! >> >> Thanks, Jason >> >> On Thursday, June 13, 2013 12:02:56 PM UTC-7, Leon Barrett wrote: >>> >>> Hi. I've been working with people at Prismatic to optimize some simple >>> math code in Clojure. However, it seems that Clojure generates an >>> unnecessary type check that slows our (otherwise-optimized) code by 50%. Is >>> there a good way to avoid this, is it a bug in Clojure 1.5.1, or something >>> else? What should I do to work around this? >>> >>> Here's my example. The aget seems to generate an unnecessary checkcast >>> bytecode. I used Jasper and Jasmin to decompile and recompile Bar.class into >>> Bar_EDITED.class, without that bytecode. The edited version takes about 2/3 >>> the time. >>> >>> (ns demo >>> (:import demo.Bar_EDITED)) >>> >>> (definterface Foo >>> (arraysum ^double [^doubles a ^int i ^int asize ^double sum])) >>> >>> (deftype Bar [] >>> Foo >>> (arraysum ^double [this ^doubles a ^int i ^int asize ^double sum] >>> (if (< i asize) >>> (recur a (unchecked-inc-int i) asize (+ sum (aget a i))) >>> sum))) >>> >>> (defn -main [& args] >>> (let [bar (Bar.) >>> bar-edited (Bar_EDITED.) >>> asize 10000 >>> a (double-array asize) >>> i 0 >>> ntimes 10000] >>> (time >>> >>> (dotimes [iter ntimes] >>> (.arraysum bar a i asize 0))) >>> (time >>> (dotimes [iter ntimes] >>> (.arraysum bar-edited a i asize 0))))) >>> >>> >>> ;; $ lein2 run -m demo >>> ;; Compiling demo >>> ;; "Elapsed time: 191.015885 msecs" >>> ;; "Elapsed time: 129.332 msecs" >>> >>> >>> Here's the bytecode for Bar.arraysum: >>> >>> public java.lang.Object arraysum(double[], int, int, double); >>> Code: >>> 0: iload_2 >>> 1: i2l >>> 2: iload_3 >>> 3: i2l >>> 4: lcmp >>> 5: ifge 39 >>> 8: aload_1 >>> 9: iload_2 >>> 10: iconst_1 >>> 11: iadd >>> 12: iload_3 >>> 13: dload 4 >>> 15: aload_1 >>> 16: aconst_null >>> 17: astore_1 >>> 18: checkcast #60 // class "[D" >>> 21: iload_2 >>> 22: invokestatic #64 // Method >>> clojure/lang/RT.intCast:(I)I >>> 25: daload >>> 26: dadd >>> 27: dstore 4 >>> 29: istore_3 >>> 30: istore_2 >>> 31: astore_1 >>> 32: goto 0 >>> 35: goto 44 >>> 38: pop >>> 39: dload 4 >>> 41: invokestatic #70 // Method >>> java/lang/Double.valueOf:(D)Ljava/lang/Double; >>> 44: areturn >>> >>> >>> As far as I can tell, Clojure generated a checkcast opcode that tests on >>> every loop to make sure the double array is really a double array. When I >>> remove that checkcast, I get a 1/3 speedup (meaning it's a 50% overhead). >>> >>> Can someone help me figure out how to avoid this overhead? >>> >>> Thanks. >>> >>> - Leon Barrett > > -- > -- > You received this message because you are subscribed to the Google > Groups "Clojure" group. > To post to this group, send email to clojure@googlegroups.com > Note that posts from new members are moderated - please be patient with your > first post. > To unsubscribe from this group, send email to > clojure+unsubscr...@googlegroups.com > For more options, visit this group at > http://groups.google.com/group/clojure?hl=en > --- > You received this message because you are subscribed to a topic in the > Google Groups "Clojure" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/clojure/LTtxhPxH_ws/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > clojure+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.