On Wed, Dec 30, 2020 at 8:12 PM Barry Smith <[email protected]> wrote:
> > > On Dec 30, 2020, at 6:45 PM, Mark Adams <[email protected]> wrote: > > > > On Wed, Dec 30, 2020 at 7:12 PM Barry Smith <[email protected]> wrote: > >> >> If you are using direct solvers on each block on each GPU (several >> matrices on each GPU) you could pull apart, for example, >> MatSolve_SeqAIJCUSPARSE() >> and launch each of the matrix solves on a separate stream. > > > Yes, that is what I want. The first step is to figure out the best way to > get the blocks from Plex/Forest and get an exact solver working on the CPU > with ASM. > > > I don't think you want ASM or at most you it inside PCFIELDSPLIT. It is > splits job to pull out fields, not ASM's job (that pulls out geometrically > connected regions). > I was thinking about getting the IS for each field and creating an ASM for that, but FieldSplit can do that I guess. How would I do that? > > > >> You could use a MatSolveBegin/MatSolveEnd style or as Jed may prefer a >> Wait() model. Maybe a couple hours coding to produce a prototype >> MatSolveBegin/MatSolveEnd from MatSolve_SeqAIJCUSPARSE. >> >> Note pulling apart a non-coupled single MatAIJ that contains all the >> matrices would be hugely expensive. Better to build each matrix already >> separate or use MatNest with only diagonal matrices. >> > > The problem is that it runs in TS that uses DM, so I can't reorder the > matrix without breaking TS. I mimic what DM does now. > > > DM decides the ordering, not TS. > Yes, > You could slip a MatSetLocalToGlobal mapping in that uninterlaces the > variables to get your DM to build an uninterlaced matrix. > That sounds fragile but Matt would be the one to ask. I realized that I also need DM for doing FE integrals, for diagnostics, during the solve phase so I can't throw it away, but replacing DM[Forest]'s matrix ordering with a field major ordering and then assembling into a MatNest directly, and then having LU work directly on each block sounds promising. > For the vector it is easier but again you will need to uninterlace it. > Back in the classic Cray vector machine days interlacing was bad, with > Intel CPUs it became good, now both approaches should be supported in > software. > > All the DMs should support both interlaced and noninterlaced algebraic > objects. > That would be nice, but let's see. Plexes "interlaced" ordering is not that simple. It is interlaced up to Q2 elements, then a Q3 it mixes interlaced and non-interlaced. :o Apparently FE topology people like to put data on topological entities, which means that Q1 puts the data on the vertices, Q2 + on the edges and cell centers (2D), but Q3 has no more topological objects so it put two "vertices" on edges and 4 in the cell center and these dofs are not interlaced. I have spent a fair amount of time in the last few months reverse engineering DM :) > > > I run once on the CPU to get the metadata for GPU assembly from DMForest. > Maybe I should just get all the metadata that I need and throw the DM away > after the setup solve and run TS without a DM... > > >> >> Barry >> >> >> > On Dec 30, 2020, at 5:46 PM, Jed Brown <[email protected]> wrote: >> > >> > Mark Adams <[email protected]> writes: >> > >> >> I see that ASM has a DM and can get subdomains from it. I have a >> DMForest >> >> and I would like an ASM that has a subdomain for each field. How might >> I go >> >> about doing this? (the fields are not coupled in the matrix so this >> would >> >> give a block diagonal matrix, and thus exact with LU sub solvers. >> > >> > The fields are already not coupled or you want to filter the matrix and >> give back a single matrix with coupling removed? >> > >> > You can use Fieldsplit to get the math of field-based block Jacobi (or >> ASM, but overlap with fields tends to be expensive). Neither FieldSplit or >> ASM can run the (additive) solves concurrently (and most libraries would >> need something to drive the threads). >> > >> >> I am then going to want to get these separate solves to be run in >> parallel >> >> on a GPU (I'm talking with Sherry about getting SuperLU working on >> these >> >> small problems). In looking at PCApply_ASM it looks like this will take >> >> some thought. KSPSolve would need to be non-blocking, etc., or a new >> apply >> >> op might be needed. >> >> >> >> Thanks, >> >> Mark >> >> >
