Barry Smith <[email protected]> writes:

>   If you are using direct solvers on each block on each GPU (several matrices 
> on each GPU) you could pull apart, for example, MatSolve_SeqAIJCUSPARSE()
> and launch each of the matrix solves on a separate stream.   You could use a 
> MatSolveBegin/MatSolveEnd style or as Jed may prefer a Wait() model. Maybe a 
> couple hours coding to produce a prototype MatSolveBegin/MatSolveEnd from 
> MatSolve_SeqAIJCUSPARSE.

I doubt cusparse_solve is a single kernel launch (and there's two of them 
already). You'd almost certainly need a thread to keep driving it, or an 
async/await model. Begin/End pairs for compute (even "offloaded") compute are 
no small change. 

>   Note pulling apart a non-coupled single MatAIJ that contains all the 
> matrices would be hugely expensive. Better to build each matrix already 
> separate or use MatNest with only diagonal matrices.

Nonsense, the ND will notice that they're decoupled and you get more meat per 
kernel launch.

Reply via email to