Dear Petsc devs,

I'm encountering an error running hypre on a single node with multiple GPUs.
The issue is in the setup phase. I'm trying to troubleshoot, but don't know 
where to start.
Are the system routines PetScCUDAInitialize and PetScCUDAInitializeCheck 
available in python?
How do I verify that GPUs are assigned properly to each MPI process? In this 
case, I have 3 tasks and 3 GPUs.

The code works with pc-type hypre on a single GPU.
Any suggestions are appreciated!

Below is the error trace:
``
TACC:  Starting up job 1490124
TACC:  Setting up parallel environment for MVAPICH2+mpispawn.
TACC:  Starting parallel tasks...
[0]PETSC ERROR: 
------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably 
memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and 
https://petsc.org/release/faq/
[0]PETSC ERROR: or try https://docs.nvidia.com/cuda/cuda-memcheck/index.html on 
NVIDIA CUDA systems to find memory corruption errors
[0]PETSC ERROR: ---------------------  Stack Frames 
------------------------------------
[0]PETSC ERROR: The line numbers in the error traceback are not always exact.
[0]PETSC ERROR: #1 hypre_ParCSRMatrixMigrate()
[0]PETSC ERROR: #2 MatBindToCPU_HYPRE() at 
/work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:1394
[0]PETSC ERROR: #3 MatAssemblyEnd_HYPRE() at 
/work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:1471
[0]PETSC ERROR: #4 MatAssemblyEnd() at 
/work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:5773
[0]PETSC ERROR: #5 MatConvert_AIJ_HYPRE() at 
/work/06368/annayesy/ls6/petsc/src/mat/impls/hypre/mhypre.c:660
[0]PETSC ERROR: #6 MatConvert() at 
/work/06368/annayesy/ls6/petsc/src/mat/interface/matrix.c:4421
[0]PETSC ERROR: #7 PCSetUp_HYPRE() at 
/work/06368/annayesy/ls6/petsc/src/ksp/pc/impls/hypre/hypre.c:245
[0]PETSC ERROR: #8 PCSetUp() at 
/work/06368/annayesy/ls6/petsc/src/ksp/pc/interface/precon.c:1080
[0]PETSC ERROR: #9 KSPSetUp() at 
/work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #10 KSPSolve_Private() at 
/work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:833
[0]PETSC ERROR: #11 KSPSolve() at 
/work/06368/annayesy/ls6/petsc/src/ksp/ksp/interface/itfunc.c:1080
application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
``

Below is a minimum working example:
``
import numpy,petsc4py,sys,time
petsc4py.init(sys.argv)
from petsc4py import PETSc
from time import time

n     = int(5e5);
comm  = PETSc.COMM_WORLD

pA = PETSc.Mat(comm=comm)
pA.create(comm=comm)
pA.setSizes((n,n))
pA.setType(PETSc.Mat.Type.AIJ)
pA.setPreallocationNNZ(3)
rstart,rend=pA.getOwnershipRange()

print("\t Processor %d of %d gets indices 
%d:%d"%(comm.Get_rank(),comm.Get_size(),rstart,rend))
if (rstart == 0):
    pA.setValue(0,0,2); pA.setValue(0,1,-1)
if (rend == n):
    pA.setValue(n-1,n-2,-1); pA.setValue(n-1,n-1,2)

for index in range(rstart,rend):
    if (rstart > 0):
        pA.setValue(index,index-1,-1)
    pA.setValue(index,index,2)
    if (rend < n):
        pA.setValue(index,index+1,-1)

pA.assemble()
pA = pA.convert(mat_type='aijcusparse')

px,pb = pA.createVecs()
pb.set(1.0); px.set(1.0)

ksp = PETSc.KSP().create()
ksp.setOperators(pA)
ksp.setConvergenceHistory()
ksp.setType('cg')
ksp.getPC().setType('hypre')
ksp.setTolerances(rtol=1e-10)

ksp.solve(pb, px)                           # error is generated here
``

Best,
Anna

Reply via email to