Hi Evan, Le 24/02/2014 17:49, Evan Ward a écrit : > I've looked into improving performance further, but it seems any further > improvements will need big API changes for memory management. > > Currently using Gauss-Newton with Cholesky (or LU) requires 4 matrix > allocations _each_ evaluation. The objective function initially > allocates the Jacobian matrix. Then the weights are applied through > matrix multiplication, allocating a new matrix. Computing the normal > equations allocates a new matrix to hold the result, and finally the > decomposition allocates it's own matrix as a copy. With QR there are 3 > matrix allocations each model function evaluation, since there is no > need to compute the normal equations, but the third allocation+copy is > larger. Some empirical sampling data I've collected with the jvisualvm > tool indicates that matrix allocation and copying takes 30% to 80% of > the execution time, depending on the dimension of the Jacobian. > > One way to improve performance would be to provide pre-allocated space > for the Jacobian and reuse it for each evaluation. The > LeastSquaresProblem interface would then be: > > void evaluate(RealVector point, RealVector resultResiduals, RealVector > resultJacobian); > > I'm interested in hearing your ideas on other approaches to solve this > issue. Or even if this is an issue worth solving.
Yes, I think this issue is worth solving, especially since we are going to ship 3.3 and need to fix as much as possible before the release, thus avoiding future problems. Everything spotted now is worth fixing now. Your approach seems reasonable, as long as the work arrays are really allocated at the start of the optimization and shared only through a few documented methods like the one you propose. This would mean we can say in the javadoc that these area should be used only to fulfill the API requirements and not copied elsewhere, as they *will* be modified as the algorithm run, and are explicitly devoted to avoid reallocation. I guess this kind of problems is more important when lots of observations are performed, which correspond to very frequent use case (at least in the fields I know about). For the record, what you propose seems similar to what is done in the ODE package, as the state vector and its first derivatives are also kept in preallocated arrays which are reused throughout the integration and are used to exchange data between the Apache Commons Math algorithm and the user problem to be solved. So it is somehting we already do elsewhere. best regards, Luc > > Best Regards, > Evan > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org