Jose Just send me a MWE and I’ll fix the case for you
Thanks Stefano > On May 8, 2020, at 6:13 PM, Jose E. Roman <[email protected]> wrote: > > Stefano, I have tried to make my code work with your code on GPU (branch > jose/bv-matmult-fallback), but I have errors. > > This is what I do on CPU: > > if (create) { > ierr = > MatCreateDense(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr); > /* pass a pointer to avoid allocation of storage */ > ierr = MatDensePlaceArray(bv->Aget,NULL);CHKERRQ(ierr); /* replace with a > null pointer, the value after BVRestoreMat */ > } > ierr = MatDensePlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr); > /* set the actual pointer */ > > The analogue on GPU: > > if (create) { > ierr = > MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr); > /* pass a pointer to avoid allocation of storage */ > ierr = MatDenseCUDAPlaceArray(bv->Aget,NULL);CHKERRQ(ierr); /* replace with > a null pointer, the value after BVRestoreMat */ > } > ierr = > MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr); /* > set the actual pointer */ > > But it does not work: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Operation done in wrong order > [0]PETSC ERROR: MatDenseCUDAResetArray() must be called first > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40 GIT > Date: 2020-05-08 11:20:42 +0300 > [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman > Fri May 8 16:54:23 2020 > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort > --with-cxx=mpiicpc > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-cuda > [0]PETSC ERROR: #1 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 183 in > /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu > [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray() line 1930 in > /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c > [0]PETSC ERROR: #3 BVGetMat_Svec_CUDA() line 749 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu > [0]PETSC ERROR: #4 BVGetMat() line 1455 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c > [0]PETSC ERROR: #5 BVMatMult_Svec_CUDA() line 556 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu > [0]PETSC ERROR: #6 BVMatMult() line 597 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c > [0]PETSC ERROR: #7 EPSSolve_LOBPCG() line 150 in > /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c > [0]PETSC ERROR: #8 EPSSolve() line 149 in > /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c > [0]PETSC ERROR: #9 main() line 167 in ex19.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -check_pointer_intensity 0 > [0]PETSC ERROR: -eps_type lobpcg > [0]PETSC ERROR: -error_output_stdout > [0]PETSC ERROR: -malloc > [0]PETSC ERROR: -malloc_debug > [0]PETSC ERROR: -malloc_dump > [0]PETSC ERROR: -mat_type aijcusparse > [0]PETSC ERROR: -use_gpu_aware_mpi 0 > [0]PETSC ERROR: ----------------End of Error Message -------send entire error > message to [email protected] > > I tried a simplified version, where I let MatCreateDenseCUDA() allocate the > array: > > if (create) { > ierr = > MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,NULL,&bv->Aget);CHKERRQ(ierr); > } > ierr = MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr); > > Now the error I get: > > [0]PETSC ERROR: --------------------- Error Message > -------------------------------------------------------------- > [0]PETSC ERROR: Error in external library > [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument > [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for > trouble shooting. > [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40 GIT > Date: 2020-05-08 11:20:42 +0300 > [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman > Fri May 8 16:54:51 2020 > [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort > --with-cxx=mpiicpc > --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl > --with-cuda > [0]PETSC ERROR: #1 MatSeqDenseCUDACopyToGPU() line 165 in > /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu > [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 184 in > /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu > [0]PETSC ERROR: #3 MatDenseCUDAPlaceArray() line 1930 in > /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c > [0]PETSC ERROR: #4 BVGetMat_Svec_CUDA() line 749 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu > [0]PETSC ERROR: #5 BVGetMat() line 1455 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c > [0]PETSC ERROR: #6 BVMatMult_Svec_CUDA() line 556 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu > [0]PETSC ERROR: #7 BVMatMult() line 597 in > /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c > [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 150 in > /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c > [0]PETSC ERROR: #9 EPSSolve() line 149 in > /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c > [0]PETSC ERROR: #10 main() line 167 in ex19.c > [0]PETSC ERROR: PETSc Option Table entries: > [0]PETSC ERROR: -check_pointer_intensity 0 > [0]PETSC ERROR: -eps_type lobpcg > [0]PETSC ERROR: -error_output_stdout > [0]PETSC ERROR: -malloc > [0]PETSC ERROR: -malloc_debug > [0]PETSC ERROR: -malloc_dump > [0]PETSC ERROR: -mat_type aijcusparse > [0]PETSC ERROR: -use_gpu_aware_mpi 0 > [0]PETSC ERROR: ----------------End of Error Message -------send entire error > message to [email protected] > > The pointer vv is obtained with VecCUDAGetArray(). Any idea? > Jose > > >> El 7 may 2020, a las 20:16, Stefano Zampini <[email protected]> >> escribió: >> >> Jose >> >> I have just pushed some code to support MPI DENSE CUDA matrices and >> MatMatMult operations (basic loop over columns, without copy into vectors). >> I have rebased against the latest master >> Let me know if it works for you. I will strip out the relevant commits and >> make a new MR >> >> Pierre, I have added a test for sbaij in parallel and it works nicely >> (automatically doing the loop over dense columns). Let me know if it works >> for you now >> >> Thanks >> >> Il giorno gio 7 mag 2020 alle ore 00:17 Stefano Zampini >> <[email protected]> ha scritto: >> >>> >>> >>>> El 6 may 2020, a las 20:00, Pierre Jolivet <[email protected]> >>>> escribió: >>>> >>>> Stefano, >>>> Is this working for nsize > 1 >>>> https://gitlab.com/petsc/petsc/-/blob/7e88e4dd44e2a5120b858cf9f19502ac359985be/src/mat/tests/ex70.c#L295 >>>> I am now getting (in another example): >>>> [0]PETSC ERROR: Call MatProductSymbolic() first >>>> Instead of the previous: >>>> [0]PETSC ERROR: MatProductSetFromOptions_AB for A mpisbaij and B mpidense >>>> is not supported >>>> >> >> Pierre, >> >> Not sure what is going on if you do not tell me what to run. My branch >> stefanozampini/feature-add-hpackages is off master and has been recently >> rebased (includes the fixes I have made in maint too) >> BTW, I found your message below Jose’s answer and I never get your original >> message. Did you forget to send to petsc-dev? >> >> >> >>>> (But my branch is lagging behind maint, so maybe I’m missing some other >>>> fixes, take this with a grain of salt). >>>> Thanks, >>>> Pierre >>>> >>>>> On 6 May 2020, at 4:52 PM, Stefano Zampini <[email protected]> >>>>> wrote: >>>>> >>>>> I have working support for MATSHELL here >>>>> https://gitlab.com/petsc/petsc/-/commit/146e7f1ccf5f267b36079cac494077a23e8bbc45 >>>>> Tested here >>>>> https://gitlab.com/petsc/petsc/-/commit/c4fcaa45a01cc783c629913983b204a1cbcb3939 >>>>> >>>>> Jose and Pierre, this code is supposed to work with CUDA, but I haven't >>>>> tested it yet >>>>> Can you tell me if this fixes the issues for you to not have to loop over >>>>> the columns of the dense matrix yourself? >>>>> >>>>> Il giorno mer 6 mag 2020 alle ore 10:09 Stefano Zampini >>>>> <[email protected]> ha scritto: >>>>> Hong >>>>> >>>>> If the product is not supported, the type of C will never be set anyway, >>>>> so you cannot call MatHasOperation after MatProductSetFromOptions. >>>>> The purpose of MatProductSetFromOptions is to populate the function >>>>> pointers for symbolic and numeric phases. If not found, they should be >>>>> set to null instead of erroring as it is now. >>>>> What I propose is to have MatProductHasOperation (not MatHasOperation): >>>>> this function will be identical to MatHasOperation, with the only >>>>> difference that does not call PetscValidType on the input mat. >>>>> >>>>> Meanwhile, I’m coding a basic MatMat (and MatTransposeMat) driver to loop >>>>> over dense columns and apply MatMult. (Or MatMultTranspose) without >>>>> memory movement. >>>>> This will be valid for all B matrices being of type dense (and its >>>>> derivations), with C of type dense too. This in principle will fix Jose >>>>> and Pierre’s issues (they can correct me if I’m wrong) >>>>> >>>>> However, we should definitely have a way for the user to enquire if a >>>>> given operation is supported or not. >>>>> >>>>> Thanks >>>>> Stefano >>>>> >>>>>> On May 6, 2020, at 12:03 AM, Zhang, Hong <[email protected]> wrote: >>>>>> >>>>>> Stefano: >>>>>> Now, we need address this bug report: enable >>>>>> MatHasOperation(C,MATOP_MAT_MULT,&flg) for matrix products, e.g., C=A*B, >>>>>> which is related to your issue >>>>>> https://gitlab.com/petsc/petsc/-/issues/608. >>>>>> >>>>>> In petsc-3.13: >>>>>> 1) MATOP_MAT_MULT, ..., MATOP_MATMAT_MULT are removed from the MATOP >>>>>> table (they are still listed in petscmat.h -- an overlook, I'll remove >>>>>> them). >>>>>> MATOP_MAT_MULT_SYMBOLIC/NUMERIC ... are still in the table. >>>>>> 2) MatHasOperation(C,...) must be called for the matrix product C, not >>>>>> matrix A or B (slepc needs to fix this after this reported bug is fixed). >>>>>> >>>>>> Like MatSetOption(), MatHasOperation() must be called AFTER >>>>>> MatSetType(). You moved MatSetType() from MatProductSetFromOptions() >>>>>> back to MatProductSymbolic() in your latest patch, thus user has to call >>>>>> MatHasOption() after MatProductSymbolic(): >>>>>> >>>>>> MatProductCreate(A,B,NULL,&C); >>>>>> MatProductSetType(C,...); >>>>>> ... >>>>>> MatProductSetFromOptions(); //if the product is not supported for the >>>>>> given mat types, currently petsc crashes here, which we can replace with >>>>>> an error output >>>>>> >>>>>> MatProductSymbloc(); -> call MatSetType() >>>>>> MatHasOperation(C,MATOP_MAT_MULT,&flg) >>>>>> >>>>>> Question: how to call MatHasOperation(C,..) when MatProductSymbloc() is >>>>>> not supported? >>>>>> >>>>>> My fix to this bug: >>>>>> Resume MatSetType() in MatProductSetFromOptions(). Then user calls: >>>>>> >>>>>> MatProductCreate(A,B,NULL,&C); >>>>>> MatProductSetType(C,...); >>>>>> ... >>>>>> MatProductSetFromOptions(C); //if the product is not supported for the >>>>>> given mat types, C->ops->productsymbolic=NULL; >>>>>> MatHasOperation(C,MATOP_PRODUCTSYMBOLIC,&flg); >>>>>> if (flg) { >>>>>> MatProductSymbolic(C); >>>>>> ... >>>>>> } else { >>>>>> MatDestroy(&C); >>>>>> ... >>>>>> } >>>>>> >>>>>> Either you take care of this bug report, or let me know your thoughts >>>>>> about how to fix this bug. >>>>>> Hong >>>>>> From: Zhang, Hong <[email protected]> >>>>>> Sent: Saturday, April 25, 2020 2:40 PM >>>>>> To: Pierre Jolivet <[email protected]> >>>>>> Cc: Jose E. Roman <[email protected]>; Stefano Zampini >>>>>> <[email protected]>; petsc-dev <[email protected]>; Smith, >>>>>> Barry F. <[email protected]> >>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT >>>>>> >>>>>> Pierre, >>>>>> When we do >>>>>> MatProductCreate: C = A*B; //C owns A and B, thus B->refct =2 >>>>>> MatProductCreateWithMats: B = A*C; //If I let B own A and C, then >>>>>> C->refct=2 >>>>>> Then >>>>>> MatDestroy(&B) and MatDestroy(&C) only reduce their refct from 2 to 1, >>>>>> thus memory leak. >>>>>> My solution is adding >>>>>> { >>>>>> matreference; /* do not add refct when using >>>>>> MatProductCreateWithMat() to void recursive references */ >>>>>> } Mat_Product >>>>>> This flg prevents MatProductCreateWithMats() to increase reference >>>>>> counts, i.e., B does not own A and C to avoid reverse ownership. I am >>>>>> not sure this is a reasonable solution. Let me know if you have better >>>>>> solution. >>>>>> See ex109.c and ex195.c for tests. >>>>>> Hong >>>>>> From: Pierre Jolivet <[email protected]> >>>>>> Sent: Saturday, April 25, 2020 11:45 AM >>>>>> To: Zhang, Hong <[email protected]> >>>>>> Cc: Jose E. Roman <[email protected]>; Stefano Zampini >>>>>> <[email protected]>; petsc-dev <[email protected]>; Smith, >>>>>> Barry F. <[email protected]> >>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT >>>>>> >>>>>> Hong, >>>>>> José didn’t report this, though he may have run into the same issue, I >>>>>> did. >>>>>> I’ll try the branch and get back at you on GitLab MR. >>>>>> >>>>>> Thanks, >>>>>> Pierre >>>>>> >>>>>>> On 25 Apr 2020, at 6:17 PM, Zhang, Hong <[email protected]> wrote: >>>>>>> >>>>>>> Jose, >>>>>>> >>>>>>>>> I also now just tested some previously PETSC_VERSION_LT(3,13,0) >>>>>>>>> running code with C=A*B, Dense=Nest*Dense, all previously allocated >>>>>>>>> prior to a call to MatMatMult and scall = MAT_REUSE_MATRIX. >>>>>>>>> Sadly, it’s now broken. It is my fault for not having a test for this >>>>>>>>> in https://gitlab.com/petsc/petsc/-/merge_requests/2069, sorry about >>>>>>>>> that. >>>>>>>>> [0]PETSC ERROR: Call MatProductSymbolic() first >>>>>>>>> [0]PETSC ERROR: #1 MatProductNumeric() line 730 in >>>>>>>>> /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matproduct.c >>>>>>>>> [0]PETSC ERROR: #2 MatMatMult() line 9335 in >>>>>>>>> /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matrix.c >>>>>>>>> >>>>>>>>> Here is a reproducer (that will work OK with 3.12.4). >>>>>>>>> diff --git a/src/mat/tests/ex195.c b/src/mat/tests/ex195.c >>>>>>>>> index c72662bc3c..811de669c5 100644 >>>>>>>>> --- a/src/mat/tests/ex195.c >>>>>>>>> +++ b/src/mat/tests/ex195.c >>>>>>>>> @@ -73,2 +73,3 @@ int main(int argc,char **args) >>>>>>>>> ierr = >>>>>>>>> MatMatMult(nest,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C);CHKERRQ(ierr); >>>>>>>>> + ierr = >>>>>>>>> MatMatMult(nest,C,MAT_REUSE_MATRIX,PETSC_DEFAULT,&B);CHKERRQ(ierr); >>>>>>>>> ierr = MatMatMultEqual(nest,B,C,10,&equal);CHKERRQ(ierr); >>>>>>>>> >>>>>>>>> $ make -f gmakefile test searchin=mat_tests-ex195 >>>>>>>>> >>>>>>>>> I believe this is very close to the topic at hand and issue #608, so >>>>>>>>> maybe you could fix this as well in the same upcoming MR? Just let me >>>>>>>>> know, I can have a crack it otherwise. >>>>>>> >>>>>>> This is a bug. I fixed it in the branch >>>>>>> hzhang/fix-matproduct-reuse/maint. Can you test it? >>>>>>> Hong >>>>> >>>>> >>>>> >>>>> -- >>>>> Stefano >>>> >>> >> >> >> >> -- >> Stefano >
