Jose

Just send me a MWE and I’ll fix the case for you

Thanks
Stefano

> On May 8, 2020, at 6:13 PM, Jose E. Roman <[email protected]> wrote:
> 
> Stefano, I have tried to make my code work with your code on GPU (branch 
> jose/bv-matmult-fallback), but I have errors.
> 
> This is what I do on CPU:
> 
> if (create) {
>  ierr = 
> MatCreateDense(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr);
>  /* pass a pointer to avoid allocation of storage */
>  ierr = MatDensePlaceArray(bv->Aget,NULL);CHKERRQ(ierr);  /* replace with a 
> null pointer, the value after BVRestoreMat */
> }
> ierr = MatDensePlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);  
> /* set the actual pointer */
> 
> The analogue on GPU:
> 
> if (create) {
>  ierr = 
> MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,vv,&bv->Aget);CHKERRQ(ierr);
>  /* pass a pointer to avoid allocation of storage */
>  ierr = MatDenseCUDAPlaceArray(bv->Aget,NULL);CHKERRQ(ierr);  /* replace with 
> a null pointer, the value after BVRestoreMat */
> }
> ierr = 
> MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);  /* 
> set the actual pointer */
> 
> But it does not work:
> 
> [0]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [0]PETSC ERROR: Operation done in wrong order
> [0]PETSC ERROR: MatDenseCUDAResetArray() must be called first
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40  GIT 
> Date: 2020-05-08 11:20:42 +0300
> [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman 
> Fri May  8 16:54:23 2020
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort 
> --with-cxx=mpiicpc 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-cuda
> [0]PETSC ERROR: #1 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 183 in 
> /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray() line 1930 in 
> /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c
> [0]PETSC ERROR: #3 BVGetMat_Svec_CUDA() line 749 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #4 BVGetMat() line 1455 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c
> [0]PETSC ERROR: #5 BVMatMult_Svec_CUDA() line 556 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #6 BVMatMult() line 597 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c
> [0]PETSC ERROR: #7 EPSSolve_LOBPCG() line 150 in 
> /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c
> [0]PETSC ERROR: #8 EPSSolve() line 149 in 
> /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c
> [0]PETSC ERROR: #9 main() line 167 in ex19.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -check_pointer_intensity 0
> [0]PETSC ERROR: -eps_type lobpcg
> [0]PETSC ERROR: -error_output_stdout
> [0]PETSC ERROR: -malloc
> [0]PETSC ERROR: -malloc_debug
> [0]PETSC ERROR: -malloc_dump
> [0]PETSC ERROR: -mat_type aijcusparse
> [0]PETSC ERROR: -use_gpu_aware_mpi 0
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error 
> message to [email protected]
> 
> I tried a simplified version, where I let MatCreateDenseCUDA() allocate the 
> array:
> 
> if (create) {
>  ierr = 
> MatCreateDenseCUDA(PetscObjectComm((PetscObject)bv),bv->n,PETSC_DECIDE,bv->N,m,NULL,&bv->Aget);CHKERRQ(ierr);
> }
> ierr = MatDenseCUDAPlaceArray(bv->Aget,vv+(bv->nc+bv->l)*bv->n);CHKERRQ(ierr);
> 
> Now the error I get:
> 
> [0]PETSC ERROR: --------------------- Error Message 
> --------------------------------------------------------------
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: cuda error 1 (cudaErrorInvalidValue) : invalid argument
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.13.1-187-gd15b076f40  GIT 
> Date: 2020-05-08 11:20:42 +0300
> [0]PETSC ERROR: ./ex19 on a arch-gpu2-intel-c-debug-cuda named gpu2 by jroman 
> Fri May  8 16:54:51 2020
> [0]PETSC ERROR: Configure options --with-cc=mpiicc --with-fc=mpiifort 
> --with-cxx=mpiicpc 
> --with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl 
> --with-cuda
> [0]PETSC ERROR: #1 MatSeqDenseCUDACopyToGPU() line 165 in 
> /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #2 MatDenseCUDAPlaceArray_SeqDenseCUDA() line 184 in 
> /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/seq/cuda/densecuda.cu
> [0]PETSC ERROR: #3 MatDenseCUDAPlaceArray() line 1930 in 
> /home/users/proy/copa/jroman/soft/petsc/src/mat/impls/dense/mpi/mpidense.c
> [0]PETSC ERROR: #4 BVGetMat_Svec_CUDA() line 749 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #5 BVGetMat() line 1455 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvbasic.c
> [0]PETSC ERROR: #6 BVMatMult_Svec_CUDA() line 556 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/impls/svec/sveccuda/sveccuda.cu
> [0]PETSC ERROR: #7 BVMatMult() line 597 in 
> /home/users/proy/copa/jroman/soft/slepc/src/sys/classes/bv/interface/bvops.c
> [0]PETSC ERROR: #8 EPSSolve_LOBPCG() line 150 in 
> /home/users/proy/copa/jroman/soft/slepc/src/eps/impls/cg/lobpcg/lobpcg.c
> [0]PETSC ERROR: #9 EPSSolve() line 149 in 
> /home/users/proy/copa/jroman/soft/slepc/src/eps/interface/epssolve.c
> [0]PETSC ERROR: #10 main() line 167 in ex19.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -check_pointer_intensity 0
> [0]PETSC ERROR: -eps_type lobpcg
> [0]PETSC ERROR: -error_output_stdout
> [0]PETSC ERROR: -malloc
> [0]PETSC ERROR: -malloc_debug
> [0]PETSC ERROR: -malloc_dump
> [0]PETSC ERROR: -mat_type aijcusparse
> [0]PETSC ERROR: -use_gpu_aware_mpi 0
> [0]PETSC ERROR: ----------------End of Error Message -------send entire error 
> message to [email protected]
> 
> The pointer vv is obtained with VecCUDAGetArray(). Any idea?
> Jose
> 
> 
>> El 7 may 2020, a las 20:16, Stefano Zampini <[email protected]> 
>> escribió:
>> 
>> Jose
>> 
>> I have just pushed some code to support MPI DENSE CUDA matrices and 
>> MatMatMult operations (basic loop over columns, without copy into vectors).
>> I have rebased against the latest master
>> Let me know if it works for you. I will strip out the relevant commits and 
>> make a new MR
>> 
>> Pierre, I have added a test for sbaij in parallel and it works nicely 
>> (automatically doing the loop over dense columns). Let me know if it works 
>> for you now
>> 
>> Thanks
>> 
>> Il giorno gio 7 mag 2020 alle ore 00:17 Stefano Zampini 
>> <[email protected]> ha scritto:
>> 
>>> 
>>> 
>>>> El 6 may 2020, a las 20:00, Pierre Jolivet <[email protected]> 
>>>> escribió:
>>>> 
>>>> Stefano,
>>>> Is this working for nsize > 1 
>>>> https://gitlab.com/petsc/petsc/-/blob/7e88e4dd44e2a5120b858cf9f19502ac359985be/src/mat/tests/ex70.c#L295
>>>> I am now getting (in another example):
>>>> [0]PETSC ERROR: Call MatProductSymbolic() first
>>>> Instead of the previous:
>>>> [0]PETSC ERROR: MatProductSetFromOptions_AB for A mpisbaij and B mpidense 
>>>> is not supported
>>>> 
>> 
>> Pierre,
>> 
>> Not sure what is going on if you do not tell me what to run. My branch 
>> stefanozampini/feature-add-hpackages is off master and has been recently 
>> rebased (includes the fixes I have made in maint too)
>> BTW, I found your message below Jose’s answer and I never get your original 
>> message. Did you forget to send to petsc-dev?
>> 
>> 
>> 
>>>> (But my branch is lagging behind maint, so maybe I’m missing some other 
>>>> fixes, take this with a grain of salt).
>>>> Thanks,
>>>> Pierre
>>>> 
>>>>> On 6 May 2020, at 4:52 PM, Stefano Zampini <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>> I have working support for MATSHELL here 
>>>>> https://gitlab.com/petsc/petsc/-/commit/146e7f1ccf5f267b36079cac494077a23e8bbc45
>>>>> Tested here 
>>>>> https://gitlab.com/petsc/petsc/-/commit/c4fcaa45a01cc783c629913983b204a1cbcb3939
>>>>> 
>>>>> Jose and Pierre, this code is supposed to work with CUDA, but I haven't 
>>>>> tested it yet
>>>>> Can you tell me if this fixes the issues for you to not have to loop over 
>>>>> the columns of the dense matrix yourself?
>>>>> 
>>>>> Il giorno mer 6 mag 2020 alle ore 10:09 Stefano Zampini 
>>>>> <[email protected]> ha scritto:
>>>>> Hong
>>>>> 
>>>>> If the product is not supported, the type of C will never be set anyway, 
>>>>> so you cannot call MatHasOperation after MatProductSetFromOptions.
>>>>> The purpose of MatProductSetFromOptions is to populate the function 
>>>>> pointers for symbolic and numeric phases. If not found, they should be 
>>>>> set to null instead of erroring as it is now.
>>>>> What I propose is to have MatProductHasOperation (not MatHasOperation): 
>>>>> this function will be identical to MatHasOperation, with the only 
>>>>> difference that does not call PetscValidType on the input mat.
>>>>> 
>>>>> Meanwhile, I’m coding a basic MatMat (and MatTransposeMat) driver to loop 
>>>>> over dense columns and apply MatMult. (Or MatMultTranspose) without 
>>>>> memory movement.
>>>>> This will be valid for all B matrices being of type dense (and its 
>>>>> derivations), with C of type dense too. This in principle will fix Jose 
>>>>> and Pierre’s issues (they can correct me if I’m wrong)
>>>>> 
>>>>> However, we should definitely have a way for the user to enquire if a 
>>>>> given operation is supported or not. 
>>>>> 
>>>>> Thanks
>>>>> Stefano
>>>>> 
>>>>>> On May 6, 2020, at 12:03 AM, Zhang, Hong <[email protected]> wrote:
>>>>>> 
>>>>>> Stefano:
>>>>>> Now, we need address this bug report: enable 
>>>>>> MatHasOperation(C,MATOP_MAT_MULT,&flg) for matrix products, e.g., C=A*B, 
>>>>>> which is related to your issue 
>>>>>> https://gitlab.com/petsc/petsc/-/issues/608.
>>>>>> 
>>>>>> In petsc-3.13:
>>>>>> 1) MATOP_MAT_MULT, ..., MATOP_MATMAT_MULT are removed from the MATOP 
>>>>>> table (they are still listed in petscmat.h -- an overlook, I'll remove 
>>>>>> them). 
>>>>>> MATOP_MAT_MULT_SYMBOLIC/NUMERIC ... are still in the table.
>>>>>> 2) MatHasOperation(C,...) must be called for the matrix product C, not 
>>>>>> matrix A or B (slepc needs to fix this after this reported bug is fixed).
>>>>>> 
>>>>>> Like MatSetOption(), MatHasOperation() must be called AFTER 
>>>>>> MatSetType(). You moved MatSetType() from MatProductSetFromOptions() 
>>>>>> back to MatProductSymbolic() in your latest patch, thus user has to call 
>>>>>> MatHasOption() after MatProductSymbolic():
>>>>>> 
>>>>>> MatProductCreate(A,B,NULL,&C);
>>>>>> MatProductSetType(C,...);
>>>>>> ...
>>>>>> MatProductSetFromOptions();   //if the product is not supported for the 
>>>>>> given mat types, currently petsc crashes here, which we can replace with 
>>>>>> an error output
>>>>>> 
>>>>>> MatProductSymbloc(); -> call MatSetType()
>>>>>> MatHasOperation(C,MATOP_MAT_MULT,&flg)
>>>>>> 
>>>>>> Question: how to call MatHasOperation(C,..) when MatProductSymbloc() is 
>>>>>> not supported?
>>>>>> 
>>>>>> My fix to this bug:
>>>>>> Resume MatSetType() in MatProductSetFromOptions(). Then user calls:
>>>>>> 
>>>>>> MatProductCreate(A,B,NULL,&C);
>>>>>> MatProductSetType(C,...);
>>>>>> ...
>>>>>> MatProductSetFromOptions(C);  //if the product is not supported for the 
>>>>>> given mat types, C->ops->productsymbolic=NULL;
>>>>>> MatHasOperation(C,MATOP_PRODUCTSYMBOLIC,&flg);
>>>>>> if (flg) { 
>>>>>>  MatProductSymbolic(C);
>>>>>>  ...
>>>>>> } else {
>>>>>>  MatDestroy(&C);
>>>>>>  ...
>>>>>> }
>>>>>> 
>>>>>> Either you take care of this bug report, or let me know your thoughts 
>>>>>> about how to fix this bug.
>>>>>> Hong
>>>>>> From: Zhang, Hong <[email protected]>
>>>>>> Sent: Saturday, April 25, 2020 2:40 PM
>>>>>> To: Pierre Jolivet <[email protected]>
>>>>>> Cc: Jose E. Roman <[email protected]>; Stefano Zampini 
>>>>>> <[email protected]>; petsc-dev <[email protected]>; Smith, 
>>>>>> Barry F. <[email protected]>
>>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT
>>>>>> 
>>>>>> Pierre,
>>>>>> When we do 
>>>>>> MatProductCreate: C = A*B; //C owns A and B, thus B->refct =2
>>>>>> MatProductCreateWithMats: B = A*C; //If I let B own A and C, then 
>>>>>> C->refct=2
>>>>>> Then
>>>>>> MatDestroy(&B) and MatDestroy(&C) only reduce their refct from 2 to 1, 
>>>>>> thus memory leak. 
>>>>>> My solution is adding 
>>>>>> {
>>>>>>          matreference;  /* do not add refct when using 
>>>>>> MatProductCreateWithMat() to void recursive references */
>>>>>> } Mat_Product 
>>>>>> This flg prevents MatProductCreateWithMats() to increase reference 
>>>>>> counts, i.e., B does not own A and C to avoid reverse ownership. I am 
>>>>>> not sure this is a reasonable solution. Let me know if you have better 
>>>>>> solution.
>>>>>> See ex109.c and ex195.c for tests.
>>>>>> Hong
>>>>>> From: Pierre Jolivet <[email protected]>
>>>>>> Sent: Saturday, April 25, 2020 11:45 AM
>>>>>> To: Zhang, Hong <[email protected]>
>>>>>> Cc: Jose E. Roman <[email protected]>; Stefano Zampini 
>>>>>> <[email protected]>; petsc-dev <[email protected]>; Smith, 
>>>>>> Barry F. <[email protected]>
>>>>>> Subject: Re: [petsc-dev] MATOP_MAT_MULT
>>>>>> 
>>>>>> Hong,
>>>>>> José didn’t report this, though he may have run into the same issue, I 
>>>>>> did.
>>>>>> I’ll try the branch and get back at you on GitLab MR.
>>>>>> 
>>>>>> Thanks,
>>>>>> Pierre
>>>>>> 
>>>>>>> On 25 Apr 2020, at 6:17 PM, Zhang, Hong <[email protected]> wrote:
>>>>>>> 
>>>>>>> Jose,
>>>>>>> 
>>>>>>>>> I also now just tested some previously PETSC_VERSION_LT(3,13,0) 
>>>>>>>>> running code with C=A*B, Dense=Nest*Dense, all previously allocated 
>>>>>>>>> prior to a call to MatMatMult and scall = MAT_REUSE_MATRIX.
>>>>>>>>> Sadly, it’s now broken. It is my fault for not having a test for this 
>>>>>>>>> in https://gitlab.com/petsc/petsc/-/merge_requests/2069, sorry about 
>>>>>>>>> that.
>>>>>>>>> [0]PETSC ERROR: Call MatProductSymbolic() first
>>>>>>>>> [0]PETSC ERROR: #1 MatProductNumeric() line 730 in 
>>>>>>>>> /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matproduct.c
>>>>>>>>> [0]PETSC ERROR: #2 MatMatMult() line 9335 in 
>>>>>>>>> /ccc/work/cont003/rndm/rndm/petsc/src/mat/interface/matrix.c
>>>>>>>>> 
>>>>>>>>> Here is a reproducer (that will work OK with 3.12.4).
>>>>>>>>> diff --git a/src/mat/tests/ex195.c b/src/mat/tests/ex195.c
>>>>>>>>> index c72662bc3c..811de669c5 100644
>>>>>>>>> --- a/src/mat/tests/ex195.c
>>>>>>>>> +++ b/src/mat/tests/ex195.c
>>>>>>>>> @@ -73,2 +73,3 @@ int main(int argc,char **args)
>>>>>>>>>  ierr = 
>>>>>>>>> MatMatMult(nest,B,MAT_REUSE_MATRIX,PETSC_DEFAULT,&C);CHKERRQ(ierr);
>>>>>>>>> +  ierr = 
>>>>>>>>> MatMatMult(nest,C,MAT_REUSE_MATRIX,PETSC_DEFAULT,&B);CHKERRQ(ierr);
>>>>>>>>>  ierr = MatMatMultEqual(nest,B,C,10,&equal);CHKERRQ(ierr);
>>>>>>>>> 
>>>>>>>>> $ make -f gmakefile test searchin=mat_tests-ex195
>>>>>>>>> 
>>>>>>>>> I believe this is very close to the topic at hand and issue #608, so 
>>>>>>>>> maybe you could fix this as well in the same upcoming MR? Just let me 
>>>>>>>>> know, I can have a crack it otherwise.
>>>>>>> 
>>>>>>> This is a bug. I fixed it in the branch 
>>>>>>> hzhang/fix-matproduct-reuse/maint. Can you test it?
>>>>>>> Hong
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Stefano
>>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Stefano
> 

Reply via email to