Hi. > first of all, I was the author of this very usefull statement on > factories... Very constructive indeed.
Liking something or not is an impression that could well be justified afterwards. It also pushes to look for arguments that ascertain the feeling. ;-) > > > > However it also shows that the improvement is only ~13% instead of the ~30% > > reported by the benchmark in the paper... > > > could it be that their "naive" implementation as a 2D array is very > naive indeed? I notice in the listings provided in the paper that they > constantly refer to a[i][j]. I think the strength of having a row > representation is to define a temporary variable ai = a[i], and access > to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that > explains why the gain is not so big in the end. You are right; the "naïve" code repeatedly access a[i][j]. But this alone doesn't make up for the difference (cf. table below). operate (calls per timed block: 10000, timed blocks: 100, time unit: ms) name time/call std error total time ratio difference Commons Math 1.19770542e-01 2.85011660e-04 1.1977e+05 1.0000e+00 0.00000000e+00 OpenGamma naive 1.23798907e-01 4.01495625e-04 1.2380e+05 1.0336e+00 4.02836495e+03 OpenGamma 1D 1.04352827e-01 2.08970600e-04 1.0435e+05 8.7127e-01 -1.54177153e+04 OpenGamma 2D 1.12666770e-01 3.50012912e-04 1.1267e+05 9.4069e-01 -7.10377213e+03 > > > > I don't think that CM development should be focused on performance > > improvements that are so sensitive to the actual hardware (if it's indeed > > the varying amount of CPU cache that is responsible for this discrepancy). > > > That would apparently require fine tuning indeed, just like BLAS > itself, which has -I believe- specific implementations for specific > architectures. So it's a bit going against the philosophy of Java. I > wonder how a JNI interface to BLAS would perform ? That would leave > the architecture specific issues out of the Java code (which could > even provide a basic implementation of basic linear algebra operations > if people do not want to use native code. The author of the paper proposes to indeed clone the BLAS tuning methodology. However, I don't think that this should be a priority for CM (as a general-purpose math toolbox). > > > > If there are (human) resources inclined to rewrite CM algorithms in order to > > boost performance, I'd suggest to also explore the multi-threading route, as > > I feel that the type of optimizations described in this paper are more in > > the > > realm of the JVM itself. > > > I would be very interested, but know nothing on multi-threading. I > will need to explore multi-threading for work anyway, so maybe in the > future? Yes, 3.1, 3.2, ... , 4.0, ... whatever. > In the meantime, may I bring to you attention the JTransforms > library? (http://sites.google.com/site/piotrwendykier/Home) > It's a multi-threaded library for various FFT calculations. I've used > it a lot, and have been involved in the correction of some bugs. I've > never benchmarked it against CM, but the site claims (if my memory > does not fail me) greater performance. Yes, I did not perform benchmarks; however, Luc already pointed out that he had not pay particular attention to the speed efficiency of the code in CM. Also, there are other problems, cf. issue https://issues.apache.org/jira/browse/MATH-677 > Also it can handle > non-power-of-two array dimensions. Plus, the author seems to have no > longer time to spend on this library, and may be willing to share it > with CM. That would be a first step in the multi-threading realm. Unfortunately, no; he doesn't want to donate his code. > Beware, though; the basic code is a direct translation of C code, and > is sometimes difficult to read (thousands of lines, with loads of > branching: code coverage analysis was simply a nightmare!). So, the above information is only half bad news! ;-) Best, Gilles --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org