This did not work. I verified that MPI_Init_thread is being called correctly and that MPI returns that it supports this highest level of thread safety.
I am going to ask ORNL. And if I use: -fieldsplit_i1_ksp_norm_type none -fieldsplit_i1_ksp_max_it 300 for all 9 "i" variables, I can run normal iterations on the 10th variable, in a 10 species problem, and it works perfectly with 10 threads. So it is definitely that VecNorm is not thread safe. And, I want to call SuperLU_dist, which uses threads, but I don't want SuperLU to start using threads. Is there a way to tell superLU that there are no threads but have PETSc use them? Thanks, Mark On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <[email protected]> wrote: > OK, the problem is probably: > > PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED; > > There is an example that sets: > > PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE; > > This is what I need. > > > > > On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <[email protected]> wrote: > >> >> >> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <[email protected]> >> wrote: >> >>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <[email protected]> wrote: >>> >>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <[email protected]> >>>> wrote: >>>> >>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <[email protected]> wrote: >>>>> >>>>>> Yes, the problem is that each KSP solver is running in an OMP thread >>>>>> (So at this point it only works for SELF and its Landau so it is all I >>>>>> need). It looks like MPI reductions called with a comm_self are not >>>>>> thread >>>>>> safe (eg, the could say, this is one proc, thus, just copy send --> recv, >>>>>> but they don't) >>>>>> >>>>> >>>>> Instead of using SELF, how about Comm_dup() for each thread? >>>>> >>>> >>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this. >>>> Thanks, >>>> >>> >>> You would have to dup them all outside the OMP section, since it is not >>> threadsafe. Then each thread uses one I think. >>> >> >> Yea sure. I do it in SetUp. >> >> Well that worked to get *different Comms*, finally, I still get the same >> problem. The number of iterations differ wildly. This two species and two >> threads (13 SNES its that is not deterministic). Way below is one thread (8 >> its) and fairly uniform iteration counts. >> >> Maybe this MPI is just not thread safe at all. Let me look into it. >> Thanks anyway, >> >> 0 SNES Function norm 4.974994975313e-03 >> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x80017c60. >> Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0 >> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x7ffdabc0. >> Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 282 >> 1 SNES Function norm 1.836376279964e-05 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 19 >> 2 SNES Function norm 3.059930074740e-07 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 15 >> 3 SNES Function norm 4.744275398121e-08 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 4 >> 4 SNES Function norm 4.014828563316e-08 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 456 >> 5 SNES Function norm 5.670836337808e-09 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 2 >> 6 SNES Function norm 2.410421401323e-09 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 18 >> 7 SNES Function norm 6.533948191791e-10 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 458 >> 8 SNES Function norm 1.008133815842e-10 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 9 >> 9 SNES Function norm 1.690450876038e-11 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 4 >> 10 SNES Function norm 1.336383986009e-11 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 463 >> 11 SNES Function norm 1.873022410774e-12 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 113 >> 12 SNES Function norm 1.801834606518e-13 >> Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL >> iterations 1 >> 13 SNES Function norm 1.004397317339e-13 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13 >> >> >> >> >> 0 SNES Function norm 4.974994975313e-03 >> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e265010. >> Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090 >> In PCFieldSplitSetFields_FieldSplit with -------------- link: 0x6e25bc40. >> Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 282 >> 1 SNES Function norm 1.836376279963e-05 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 380 >> 2 SNES Function norm 3.018499983019e-07 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 387 >> 3 SNES Function norm 1.826353175637e-08 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 391 >> 4 SNES Function norm 1.378600599548e-09 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 392 >> 5 SNES Function norm 1.077289085611e-10 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 394 >> 6 SNES Function norm 8.571891727748e-12 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 395 >> 7 SNES Function norm 6.897647643450e-13 >> Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL >> iterations 395 >> 8 SNES Function norm 5.606434614114e-14 >> Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8 >> >> >> >> >> >> >> >> >> >>> >>> Matt >>> >>> >>>> Matt >>>>> >>>>> >>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <[email protected]> wrote: >>>>>>> >>>>>>>> It looks like PETSc is just too clever for me. I am trying to get a >>>>>>>> different MPI_Comm into each block, but PETSc is thwarting me: >>>>>>>> >>>>>>> >>>>>>> It looks like you are using SELF. Is that what you want? Do you want >>>>>>> a bunch of comms with the same group, but independent somehow? I am >>>>>>> confused. >>>>>>> >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>>> if (jac->use_openmp) { >>>>>>>> ierr = >>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr); >>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit >>>>>>>> with -------------- link: %p. Comms %p >>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp)); >>>>>>>> } else { >>>>>>>> ierr = >>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr); >>>>>>>> } >>>>>>>> >>>>>>>> produces: >>>>>>>> >>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0 >>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link: >>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0 >>>>>>>> >>>>>>>> How can I work around this? >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <[email protected]> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <[email protected]> wrote: >>>>>>>>>> >>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit apply to >>>>>>>>>> NOT use OMP and it sort of works. >>>>>>>>>> >>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every >>>>>>>>>> solve so that is a big problem. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> It should definitely not be creating vectors "in every" solve. >>>>>>>>>> But it does do lazy allocation of needed restarted vectors which may >>>>>>>>>> make >>>>>>>>>> it look like it is creating "every" vectors in every solve. You can >>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart >>>>>>>>>> vectors up >>>>>>>>>> front at KSPSetUp(). >>>>>>>>>> >>>>>>>>> >>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in cuSparse >>>>>>>>> Vecs in the 2nd solve. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Why is creating vectors "at every solve" a problem? It is not >>>>>>>>>> thread safe I guess? >>>>>>>>>> >>>>>>>>> >>>>>>>>> It dies when it looks at the options database, in a Free in the >>>>>>>>> get-options method to be exact (see stacks). >>>>>>>>> >>>>>>>>> ======= Backtrace: ========= >>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4] >>>>>>>>> >>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec] >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Richardson works except the convergence test gets confused, >>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not >>>>>>>>>> threadsafe. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> One fix for the norms might be to create each subdomain solver >>>>>>>>>> with a different communicator. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Yes you could do that. It might actually be the correct thing >>>>>>>>>> to do also, if you have multiple threads call MPI reductions on the >>>>>>>>>> same >>>>>>>>>> communicator that would be a problem. Each KSP should get a new >>>>>>>>>> MPI_Comm. >>>>>>>>>> >>>>>>>>> >>>>>>>>> OK. I will only do this. >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> What most experimenters take for granted before they begin their >>>>>>> experiments is infinitely more interesting than any results to which >>>>>>> their >>>>>>> experiments lead. >>>>>>> -- Norbert Wiener >>>>>>> >>>>>>> https://www.cse.buffalo.edu/~knepley/ >>>>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> What most experimenters take for granted before they begin their >>>>> experiments is infinitely more interesting than any results to which their >>>>> experiments lead. >>>>> -- Norbert Wiener >>>>> >>>>> https://www.cse.buffalo.edu/~knepley/ >>>>> <http://www.cse.buffalo.edu/~knepley/> >>>>> >>>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <http://www.cse.buffalo.edu/~knepley/> >>> >>
