On Thu, Jan 21, 2021 at 11:25 AM Jed Brown <[email protected]> wrote:
> Mark Adams <[email protected]> writes: > > > Yes, the problem is that each KSP solver is running in an OMP thread > > There can be more or less splits than OMP_NUM_THREADS. Each thread is > still calling blocking operations. > > This is a concurrency problem, not a parallel efficiency problem. It can > be solved with async interfaces I don't know how to do that. I want a GPU solver, probably superLU, and am starting with cuSparse ilu to get something running or by making as many threads as splits and ensuring that you don't spin > (lest contention kill performance). I don't get correctness with Richardson with > 1 OMP threads currently. This is on IBM with GNU. > OpenMP is pretty orthogonal and probably not a good fit. > Do you have an alternative? > > > (So at this point it only works for SELF and its Landau so it is all I > need). It looks like MPI reductions called with a comm_self are not thread > safe (eg, the could say, this is one proc, thus, just copy send --> recv, > but they don't) > > > > On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <[email protected]> > wrote: > > > >> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <[email protected]> wrote: > >> > >>> It looks like PETSc is just too clever for me. I am trying to get a > >>> different MPI_Comm into each block, but PETSc is thwarting me: > >>> > >> > >> It looks like you are using SELF. Is that what you want? Do you want a > >> bunch of comms with the same group, but independent somehow? I am > confused. > >> > >> Matt > >> > >> > >>> if (jac->use_openmp) { > >>> ierr = KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr); > >>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit with > >>> -------------- link: %p. Comms %p > >>> > %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp)); > >>> } else { > >>> ierr = > >>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr); > >>> } > >>> > >>> produces: > >>> > >>> In PCFieldSplitSetFields_FieldSplit with -------------- link: > 0x7e9cb4f0. > >>> Comms 0x660c6ad0 0x660c6ad0 > >>> In PCFieldSplitSetFields_FieldSplit with -------------- link: > 0x7e88f7d0. > >>> Comms 0x660c6ad0 0x660c6ad0 > >>> > >>> How can I work around this? > >>> > >>> > >>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <[email protected]> wrote: > >>> > >>>> > >>>> > >>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <[email protected]> wrote: > >>>> > >>>>> > >>>>> > >>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <[email protected]> wrote: > >>>>> > >>>>> So I put in a temporary hack to get the first Fieldsplit apply to NOT > >>>>> use OMP and it sort of works. > >>>>> > >>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every solve so > >>>>> that is a big problem. > >>>>> > >>>>> > >>>>> It should definitely not be creating vectors "in every" solve. But > it > >>>>> does do lazy allocation of needed restarted vectors which may make > it look > >>>>> like it is creating "every" vectors in every solve. You can > >>>>> use -ksp_gmres_preallocate to force it to create all the restart > vectors up > >>>>> front at KSPSetUp(). > >>>>> > >>>> > >>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse > Vecs > >>>> in the 2nd solve. > >>>> > >>>> > >>>>> > >>>>> Why is creating vectors "at every solve" a problem? It is not > thread > >>>>> safe I guess? > >>>>> > >>>> > >>>> It dies when it looks at the options database, in a Free in the > >>>> get-options method to be exact (see stacks). > >>>> > >>>> ======= Backtrace: ========= > >>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4] > >>>> > >>>> > /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec] > >>>> > >>>> > >>>> > >>>>> > >>>>> Richardson works except the convergence test gets confused, > presumably > >>>>> because MPI reductions with PETSC_COMM_SELF is not threadsafe. > >>>>> > >>>>> > >>>>> > >>>>> One fix for the norms might be to create each subdomain solver with a > >>>>> different communicator. > >>>>> > >>>>> > >>>>> Yes you could do that. It might actually be the correct thing to > do > >>>>> also, if you have multiple threads call MPI reductions on the same > >>>>> communicator that would be a problem. Each KSP should get a new > MPI_Comm. > >>>>> > >>>> > >>>> OK. I will only do this. > >>>> > >>>> > >> > >> -- > >> What most experimenters take for granted before they begin their > >> experiments is infinitely more interesting than any results to which > their > >> experiments lead. > >> -- Norbert Wiener > >> > >> https://www.cse.buffalo.edu/~knepley/ > >> <http://www.cse.buffalo.edu/~knepley/> > >> >
