Barry Smith <[email protected]> writes: > If you are using direct solvers on each block on each GPU (several matrices > on each GPU) you could pull apart, for example, MatSolve_SeqAIJCUSPARSE() > and launch each of the matrix solves on a separate stream. You could use a > MatSolveBegin/MatSolveEnd style or as Jed may prefer a Wait() model. Maybe a > couple hours coding to produce a prototype MatSolveBegin/MatSolveEnd from > MatSolve_SeqAIJCUSPARSE.
I doubt cusparse_solve is a single kernel launch (and there's two of them already). You'd almost certainly need a thread to keep driving it, or an async/await model. Begin/End pairs for compute (even "offloaded") compute are no small change. > Note pulling apart a non-coupled single MatAIJ that contains all the > matrices would be hugely expensive. Better to build each matrix already > separate or use MatNest with only diagonal matrices. Nonsense, the ND will notice that they're decoupled and you get more meat per kernel launch.
