On 10/15/11 8:46 AM, Phil Steitz wrote: > On 10/15/11 5:41 AM, Gilles Sadowski wrote: >> Hi. >> >>> first of all, I was the author of this very usefull statement on >>> factories... Very constructive indeed. >> Liking something or not is an impression that could well be justified >> afterwards. It also pushes to look for arguments that ascertain the >> feeling. ;-) >> >>>> However it also shows that the improvement is only ~13% instead of the ~30% >>>> reported by the benchmark in the paper... >>>> >>> could it be that their "naive" implementation as a 2D array is very >>> naive indeed? I notice in the listings provided in the paper that they >>> constantly refer to a[i][j]. I think the strength of having a row >>> representation is to define a temporary variable ai = a[i], and access >>> to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that >>> explains why the gain is not so big in the end. >> You are right; the "naïve" code repeatedly access a[i][j]. >> >> But this alone doesn't make up for the difference (cf. table below). >> >> operate (calls per timed block: 10000, timed blocks: 100, time unit: ms) >> name time/call std error total time ratio >> difference >> Commons Math 1.19770542e-01 2.85011660e-04 1.1977e+05 1.0000e+00 >> 0.00000000e+00 >> OpenGamma naive 1.23798907e-01 4.01495625e-04 1.2380e+05 1.0336e+00 >> 4.02836495e+03 >> OpenGamma 1D 1.04352827e-01 2.08970600e-04 1.0435e+05 8.7127e-01 >> -1.54177153e+04 >> OpenGamma 2D 1.12666770e-01 3.50012912e-04 1.1267e+05 9.4069e-01 >> -7.10377213e+03 >> >> >>>> I don't think that CM development should be focused on performance >>>> improvements that are so sensitive to the actual hardware (if it's indeed >>>> the varying amount of CPU cache that is responsible for this discrepancy). >>>> >>> That would apparently require fine tuning indeed, just like BLAS >>> itself, which has -I believe- specific implementations for specific >>> architectures. So it's a bit going against the philosophy of Java. I >>> wonder how a JNI interface to BLAS would perform ? That would leave >>> the architecture specific issues out of the Java code (which could >>> even provide a basic implementation of basic linear algebra operations >>> if people do not want to use native code. >> The author of the paper proposes to indeed clone the BLAS tuning >> methodology. >> However, I don't think that this should be a priority for CM (as a >> general-purpose math toolbox). >> >>>> If there are (human) resources inclined to rewrite CM algorithms in order >>>> to >>>> boost performance, I'd suggest to also explore the multi-threading route, >>>> as >>>> I feel that the type of optimizations described in this paper are more in >>>> the >>>> realm of the JVM itself. >>>> >>> I would be very interested, but know nothing on multi-threading. I >>> will need to explore multi-threading for work anyway, so maybe in the >>> future? > Any references to specific optimizations or algorithm improvements here? >> Yes, 3.1, 3.2, ... , 4.0, ... whatever. >> >>> In the meantime, may I bring to you attention the JTransforms >>> library? (http://sites.google.com/site/piotrwendykier/Home) >>> It's a multi-threaded library for various FFT calculations. I've used >>> it a lot, and have been involved in the correction of some bugs. I've >>> never benchmarked it against CM, but the site claims (if my memory >>> does not fail me) greater performance. >> Yes, I did not perform benchmarks; however, Luc already pointed out that he >> had not pay particular attention to the speed efficiency of the code in CM. > I don't think Luc meant to make a broad general statement there. > IIRC, he was talking about one matrix representation class. Lets > focus on specific problems and solutions.
Pls ignore. I misread the comment above as applying to the original subject, which was the linear package. I agree that the FFT impl needs work. Phil > > Phil >> Also, there are other problems, cf. issue >> https://issues.apache.org/jira/browse/MATH-67 >> >>> Also it can handle >>> non-power-of-two array dimensions. Plus, the author seems to have no >>> longer time to spend on this library, and may be willing to share it >>> with CM. That would be a first step in the multi-threading realm. >> Unfortunately, no; he doesn't want to donate his code. >> >>> Beware, though; the basic code is a direct translation of C code, and >>> is sometimes difficult to read (thousands of lines, with loads of >>> branching: code coverage analysis was simply a nightmare!). >> So, the above information is only half bad news! ;-) >> >> >> Best, >> Gilles >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org