Re: [petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Mark Adams Thu, 21 Jan 2021 19:12:04 -0800

I have tried it and it hangs, but that is expected. This is not something
she has prepared for.


I am working with Sherry on it.

And she is fine with just one thread and suggests it if she is in a thread.

Now that I think about it, I don't understand why she needs OpenMP if she
can live with OMP_NUM_THREADS=1.

Mark



On Thu, Jan 21, 2021 at 9:30 PM Barry Smith <[email protected]> wrote:

>
>
> On Jan 21, 2021, at 5:37 PM, Mark Adams <[email protected]> wrote:
>
> This did not work. I verified that MPI_Init_thread is being called
> correctly and that MPI returns that it supports this highest level of
> thread safety.
>
> I am going to ask ORNL.
>
> And if I use:
>
> -fieldsplit_i1_ksp_norm_type none
> -fieldsplit_i1_ksp_max_it 300
>
> for all 9 "i" variables, I can run normal iterations on the 10th variable,
> in a 10 species problem, and it works perfectly with 10 threads.
>
> So it is definitely that VecNorm is not thread safe.
>
> And, I want to call SuperLU_dist, which uses threads, but I don't want
> SuperLU to start using threads. Is there a way to tell superLU that there
> are no threads but have PETSc use them?
>
>
>   My interpretation and Satish's for many years is that SuperLU_DIST has
> to be built with and use OpenMP in order to work with CUDA.
>
>   def formCMakeConfigureArgs(self):
>     args = config.package.CMakePackage.formCMakeConfigureArgs(self)
>     if self.openmp.found:
>       self.usesopenmp = 'yes'
>     else:
>       args.append('-DCMAKE_DISABLE_FIND_PACKAGE_OpenMP=TRUE')
>     if self.cuda.found:
>       if not self.openmp.found:
>         raise RuntimeError('SuperLU_DIST GPU code currently requires
> OpenMP. Use --with-openmp=1')
>
> But this could be ok. You use OpenMP and then it uses OpenMP internally,
> each doing their own business (what could go wrong :-)).
>
> Have you tried it?
>
>   Barry
>
>
>
> Thanks,
> Mark
>
> On Thu, Jan 21, 2021 at 5:19 PM Mark Adams <[email protected]> wrote:
>
>> OK, the problem is probably:
>>
>> PetscMPIInt PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_FUNNELED;
>>
>> There is an example that sets:
>>
>> PETSC_MPI_THREAD_REQUIRED = MPI_THREAD_MULTIPLE;
>>
>> This is what I need.
>>
>>
>>
>>
>> On Thu, Jan 21, 2021 at 2:26 PM Mark Adams <[email protected]> wrote:
>>
>>>
>>>
>>> On Thu, Jan 21, 2021 at 2:11 PM Matthew Knepley <[email protected]>
>>> wrote:
>>>
>>>> On Thu, Jan 21, 2021 at 2:02 PM Mark Adams <[email protected]> wrote:
>>>>
>>>>> On Thu, Jan 21, 2021 at 1:44 PM Matthew Knepley <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Jan 21, 2021 at 11:16 AM Mark Adams <[email protected]> wrote:
>>>>>>
>>>>>>> Yes, the problem is that each KSP solver is running in an OMP thread
>>>>>>> (So at this point it only works for SELF and its Landau so it is all I
>>>>>>> need). It looks like MPI reductions called with a comm_self are not 
>>>>>>> thread
>>>>>>> safe (eg, the could say, this is one proc, thus, just copy send --> 
>>>>>>> recv,
>>>>>>> but they don't)
>>>>>>>
>>>>>>
>>>>>> Instead of using SELF, how about Comm_dup() for each thread?
>>>>>>
>>>>>
>>>>> OK, raw MPI_Comm_dup. I tried PetscCommDup. Let me this.
>>>>> Thanks,
>>>>>
>>>>
>>>> You would have to dup them all outside the OMP section, since it is not
>>>> threadsafe. Then each thread uses one I think.
>>>>
>>>
>>> Yea sure. I do it in SetUp.
>>>
>>> Well that worked to get *different Comms*, finally, I still get the
>>> same problem. The number of iterations differ wildly. This two species and
>>> two threads (13 SNES its that is not deterministic). Way below is one
>>> thread (8 its) and fairly uniform iteration counts.
>>>
>>> Maybe this MPI is just not thread safe at all. Let me look into it.
>>> Thanks anyway,
>>>
>>>    0 SNES Function norm 4.974994975313e-03
>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>> 0x80017c60. Comms pc=0x67ad27c0 ksp=*0x7ffe1600* newcomm=0x8014b6e0
>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>> 0x7ffdabc0. Comms pc=0x67ad27c0 ksp=*0x7fff70d0* newcomm=0x7ffe9980
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 282
>>>     1 SNES Function norm 1.836376279964e-05
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 19
>>>     2 SNES Function norm 3.059930074740e-07
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 15
>>>     3 SNES Function norm 4.744275398121e-08
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 4
>>>     4 SNES Function norm 4.014828563316e-08
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 456
>>>     5 SNES Function norm 5.670836337808e-09
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 2
>>>     6 SNES Function norm 2.410421401323e-09
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 18
>>>     7 SNES Function norm 6.533948191791e-10
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 458
>>>     8 SNES Function norm 1.008133815842e-10
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 9
>>>     9 SNES Function norm 1.690450876038e-11
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 4
>>>    10 SNES Function norm 1.336383986009e-11
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 463
>>>    11 SNES Function norm 1.873022410774e-12
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 113
>>>    12 SNES Function norm 1.801834606518e-13
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_ATOL
>>> iterations 1
>>>    13 SNES Function norm 1.004397317339e-13
>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 13
>>>
>>>
>>>
>>>
>>>     0 SNES Function norm 4.974994975313e-03
>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>> 0x6e265010. Comms pc=0x56450340 ksp=0x6e2168d0 newcomm=0x6e265090
>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>> 0x6e25bc40. Comms pc=0x56450340 ksp=0x6e22c1d0 newcomm=0x6e21e8f0
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 282
>>>     1 SNES Function norm 1.836376279963e-05
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 380
>>>     2 SNES Function norm 3.018499983019e-07
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 387
>>>     3 SNES Function norm 1.826353175637e-08
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 391
>>>     4 SNES Function norm 1.378600599548e-09
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 392
>>>     5 SNES Function norm 1.077289085611e-10
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 394
>>>     6 SNES Function norm 8.571891727748e-12
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 395
>>>     7 SNES Function norm 6.897647643450e-13
>>>       Linear fieldsplit_e_ solve converged due to CONVERGED_RTOL
>>> iterations 395
>>>     8 SNES Function norm 5.606434614114e-14
>>>   Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 8
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>
>>>>    Matt
>>>>
>>>>
>>>>>   Matt
>>>>>>
>>>>>>
>>>>>>> On Thu, Jan 21, 2021 at 10:46 AM Matthew Knepley <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 10:34 AM Mark Adams <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It looks like PETSc is just too clever for me. I am trying to get
>>>>>>>>> a different MPI_Comm into each block, but PETSc is thwarting me:
>>>>>>>>>
>>>>>>>>
>>>>>>>> It looks like you are using SELF. Is that what you want? Do you
>>>>>>>> want a bunch of comms with the same group, but independent somehow? I 
>>>>>>>> am
>>>>>>>> confused.
>>>>>>>>
>>>>>>>>    Matt
>>>>>>>>
>>>>>>>>
>>>>>>>>>   if (jac->use_openmp) {
>>>>>>>>>     ierr          =
>>>>>>>>> KSPCreate(MPI_COMM_SELF,&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>> PetscPrintf(PETSC_COMM_SELF,"In PCFieldSplitSetFields_FieldSplit
>>>>>>>>> with -------------- link: %p. Comms %p
>>>>>>>>> %p\n",ilink,PetscObjectComm((PetscObject)pc),PetscObjectComm((PetscObject)ilink->ksp));
>>>>>>>>>   } else {
>>>>>>>>>     ierr          =
>>>>>>>>> KSPCreate(PetscObjectComm((PetscObject)pc),&ilink->ksp);CHKERRQ(ierr);
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> produces:
>>>>>>>>>
>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>> 0x7e9cb4f0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>> In PCFieldSplitSetFields_FieldSplit with -------------- link:
>>>>>>>>> 0x7e88f7d0. Comms 0x660c6ad0 0x660c6ad0
>>>>>>>>>
>>>>>>>>> How can I work around this?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 21, 2021 at 7:41 AM Mark Adams <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jan 20, 2021 at 6:21 PM Barry Smith <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Jan 20, 2021, at 3:09 PM, Mark Adams <[email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>> So I put in a temporary hack to get the first Fieldsplit apply
>>>>>>>>>>> to NOT use OMP and it sort of works.
>>>>>>>>>>>
>>>>>>>>>>> Preonly/lu is fine. GMRES calls vector creates/dups in every
>>>>>>>>>>> solve so that is a big problem.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   It should definitely not be creating vectors "in every" solve.
>>>>>>>>>>> But it does do lazy allocation of needed restarted vectors which 
>>>>>>>>>>> may make
>>>>>>>>>>> it look like it is creating "every" vectors in every solve.  You can
>>>>>>>>>>> use -ksp_gmres_preallocate to force it to create all the restart 
>>>>>>>>>>> vectors up
>>>>>>>>>>> front at KSPSetUp().
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Well, I run the first solve w/o OMP and I see Vec dups in
>>>>>>>>>> cuSparse Vecs in the 2nd solve.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>   Why is creating vectors "at every solve" a problem? It is not
>>>>>>>>>>> thread safe I guess?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> It dies when it looks at the options database, in a Free in the
>>>>>>>>>> get-options method to be exact (see stacks).
>>>>>>>>>>
>>>>>>>>>> ======= Backtrace: =========
>>>>>>>>>> /lib64/libc.so.6(cfree+0x4a0)[0x200021839be0]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscFreeAlign+0x4c)[0x2000002a368c]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(PetscOptionsEnd_Private+0xf4)[0x2000002e53f0]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x7c6c28)[0x2000008b6c28]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreate_SeqCUDA+0x11c)[0x20000052c510]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecSetType+0x670)[0x200000549664]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecCreateSeqCUDA+0x150)[0x20000052c0b0]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(+0x43c198)[0x20000052c198]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicate+0x44)[0x200000542168]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs_Default+0x148)[0x200000543820]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(VecDuplicateVecs+0x54)[0x2000005425f4]
>>>>>>>>>>
>>>>>>>>>> /gpfs/alpine/csc314/scratch/adams/petsc/arch-summit-opt-gnu-cuda-omp/lib/libpetsc.so.3.014(KSPCreateVecs+0x4b4)[0x2000016f0aec]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Richardson works except the convergence test gets confused,
>>>>>>>>>>> presumably because MPI reductions with PETSC_COMM_SELF is not 
>>>>>>>>>>> threadsafe.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> One fix for the norms might be to create each subdomain solver
>>>>>>>>>>> with a different communicator.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    Yes you could do that. It might actually be the correct thing
>>>>>>>>>>> to do also, if you have multiple threads call MPI reductions on the 
>>>>>>>>>>> same
>>>>>>>>>>> communicator that would be a problem. Each KSP should get a new 
>>>>>>>>>>> MPI_Comm.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OK. I will only do this.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> What most experimenters take for granted before they begin their
>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>> their
>>>>>>>> experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>>>>>
>>>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> What most experimenters take for granted before they begin their
>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>> their
>>>>>> experiments lead.
>>>>>> -- Norbert Wiener
>>>>>>
>>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>> https://www.cse.buffalo.edu/~knepley/
>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>
>>>
>

Re: [petsc-dev] Memory problem with OpenMP and Fieldsplit sub solvers

Reply via email to