Lucas, Why are you interested in hybrid parallelism? Are you hoping to improve the performance of your code or is it simply something you want to try? If the solver is the bottleneck in your code, you should focus on finding a better preconditioner. With that being said, matrix-free methods tend to be faster than matrix-based method if you can use them. As to your question, in general you can assume that when you use Trilinos or PETSc, we only use distributed parallelism. However, when you use deal.II's own data structure (Solver, Vector, etc.), we use multithreading. We just don't advertise it. The matrix-free framework support MPI-3.0 shared memory. We have CUDA matrix-free methods but we are rewriting them using Kokkos. Hopefully, the refactor will be done by the end of next month.
Best, Bruno On Thursday, March 16, 2023 at 2:40:24 PM UTC-4 lucasm...@gmail.com wrote: > Hi folks, > > I'm wondering if there's somewhere I can look to get a broad overview of > the different parallelization schemes available in deal.II for various > pieces of the library, and maybe what people's experiences have been. As I > understand it, any matrix/vector assembly can be done with a > shared-parallel scheme with the threads framework, but I'm less sure about > solvers (which are typically the bottleneck in my applications). To relay > my understanding so far (and ask some specific questions): > > Matrix-based AMG methods: > > - PETSc and Trilinos Epetra only use MPI distributed parallelism > - Trilinos Tpetra with MueLu is hybrid, but requires Kokkos. Do the > Tpetra wrappers have associated solvers? And how do they work with Kokkos > via CUDA? > > Matrix-based GMG methods: > > - Use deal.II's own distributed vector, and wraps a regular deal.II > solver (although perhaps uses a Trilinos smoother). Are any of the solvers > hybrid-parallelized? And are there deal.II-specific smoothers to avoid > copying to Epetra vectors? > > Matrix-free GMG methods: > > - Use deal.II's distributed vector and wrap regular solver. > Additionally, the matrix multiplication is done via a matrix-free > operator. > Is it possible (or easy) to write a custom matrix-free operator which > takes > advantage of shared parallelism? And will the solver take advantage of > shared-memory parallelism when (say) adding vectors? > > Matrix-free GMG methods with hp-adaptivity: > > - The only example that I've seen is step-75, and that uses Trilinos > AMG for the coarse solver, which I think is limited to MPI distributed > parallelism. Could the coarse solver be adapted to use shared parallelism, > and are the other aspects of this solver able to be parallelized? > > As a final question: which of these aspects are eligible for handling by > CUDA, MPI-3.0 shared memory, and Kokkos? And have folks found a significant > speed-up by implementing any of these tools in their code? > > Thanks so much for any insight, > Lucas > -- The deal.II project is located at http://www.dealii.org/ For mailing list/forum options, see https://groups.google.com/d/forum/dealii?hl=en --- You received this message because you are subscribed to the Google Groups "deal.II User Group" group. To unsubscribe from this group and stop receiving emails from it, send an email to dealii+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/dealii/22d853c1-ade8-43c0-91f2-d0ecde4f2fa1n%40googlegroups.com.