I've looked into improving performance further, but it seems any further improvements will need big API changes for memory management.
Currently using Gauss-Newton with Cholesky (or LU) requires 4 matrix allocations _each_ evaluation. The objective function initially allocates the Jacobian matrix. Then the weights are applied through matrix multiplication, allocating a new matrix. Computing the normal equations allocates a new matrix to hold the result, and finally the decomposition allocates it's own matrix as a copy. With QR there are 3 matrix allocations each model function evaluation, since there is no need to compute the normal equations, but the third allocation+copy is larger. Some empirical sampling data I've collected with the jvisualvm tool indicates that matrix allocation and copying takes 30% to 80% of the execution time, depending on the dimension of the Jacobian. One way to improve performance would be to provide pre-allocated space for the Jacobian and reuse it for each evaluation. The LeastSquaresProblem interface would then be: void evaluate(RealVector point, RealVector resultResiduals, RealVector resultJacobian); I'm interested in hearing your ideas on other approaches to solve this issue. Or even if this is an issue worth solving. Best Regards, Evan --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org