Github user srowen commented on the pull request: https://github.com/apache/spark/pull/131#issuecomment-37532494 I do mean dgemm, since it is in jblas, although dsyrk would be even better as it is specialized for this case. gemm can treat its args as transposed and apply scalars, so it can do C <- a*At*A + 1*C. (Well, it would require exposing NativeBlas.dgemm more directly through SimpleBlas, but that's a fairly simple change. Yes I don't mean just one operation would compute Xt*Cu*X, but that it could be used to do exactly what you are doing with dspr -- compute C(u,i) * X(i)t * X and add it to an accumulator. I think it can even be used to do this all in place, but someone who knows more can tell me whether it's OK to have it put the result in the same place it's reading the C argument from. I can't say I'm sure it's a win once all the overhead is factored in but you see the idea. In theory it should help at some size to do this bit in BLAS vs the JVM.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---