Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/131#issuecomment-37532494
  
    I do mean dgemm, since it is in jblas, although dsyrk would be even better 
as it is specialized for this case. gemm can treat its args as transposed and 
apply scalars, so it can do C <- a*At*A + 1*C. (Well, it would require exposing 
NativeBlas.dgemm more directly through SimpleBlas, but that's a fairly simple 
change.
    
    Yes I don't mean just one operation would compute Xt*Cu*X, but that it 
could be used to do exactly what you are doing with dspr -- compute C(u,i) * 
X(i)t * X and add it to an accumulator.
    
    I think it can even be used to do this all in place, but someone who knows 
more can tell me whether it's OK to have it put the result in the same place 
it's reading the C argument from.
    
    I can't say I'm sure it's a win once all the overhead is factored in but 
you see the idea. In theory it should help at some size to do this bit in BLAS 
vs the JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to