The examples that use DM, in particular DMDA all trivially support using the 
GPU with -dm_mat_type aijcusparse -dm_vec_type cuda



> On Jul 17, 2023, at 1:45 AM, Ng, Cho-Kuen <[email protected]> wrote:
> 
> Barry,
> 
> Thank you so much for the clarification. 
> 
> I see that ex104.c and ex300.c use  MatXAIJSetPreallocation(). Are there 
> other tutorials available?
> 
> Cho
> From: Barry Smith <[email protected] <mailto:[email protected]>>
> Sent: Saturday, July 15, 2023 8:36 AM
> To: Ng, Cho-Kuen <[email protected] <mailto:[email protected]>>
> Cc: [email protected] <mailto:[email protected]> 
> <[email protected] <mailto:[email protected]>>
> Subject: Re: [petsc-users] Using PETSc GPU backend
>  
> 
>   
>    Cho,
> 
>     We currently have a crappy API for turning on GPU support, and our 
> documentation is misleading in places. 
> 
>     People constantly say "to use GPU's with PETSc you only need to use 
> -mat_type aijcusparse (for example)" This is incorrect.
> 
>  This does not work with code that uses the convenience Mat constructors such 
> as MatCreateAIJ(), MatCreateAIJWithArrays etc. It only works if you use the 
> constructor approach of MatCreate(), MatSetSizes(), MatSetFromOptions(), 
> MatXXXSetPreallocation(). ...  Similarly you need to use VecCreate(), 
> VecSetSizes(), VecSetFromOptions() and -vec_type cuda
> 
>    If you use DM to create the matrices and vectors then you can use 
> -dm_mat_type aijcusparse -dm_vec_type cuda
> 
>    Sorry for the confusion.
> 
>    Barry
> 
> 
> 
> 
>> On Jul 15, 2023, at 8:03 AM, Matthew Knepley <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On Sat, Jul 15, 2023 at 1:44 AM Ng, Cho-Kuen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Matt,
>> 
>> After inserting 2 lines in the code:
>> 
>>   ierr = MatCreate(PETSC_COMM_WORLD,&A);CHKERRQ(ierr);                       
>>   ierr = MatSetFromOptions(A);CHKERRQ(ierr);
>>   ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>                       d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> 
>> "There are no unused options." However, there is no improvement on the GPU 
>> performance.
>> 
>> 1. MatCreateAIJ() sets the type, and in fact it overwrites the Mat you 
>> created in steps 1 and 2. This is detailed in the manual.
>> 
>> 2. You should replace MatCreateAIJ(), with MatSetSizes() before 
>> MatSetFromOptions().
>> 
>>   THanks,
>> 
>>     Matt
>>  
>> Thanks,
>> Cho
>> 
>> From: Matthew Knepley <[email protected] <mailto:[email protected]>>
>> Sent: Friday, July 14, 2023 5:57 PM
>> To: Ng, Cho-Kuen <[email protected] <mailto:[email protected]>>
>> Cc: Barry Smith <[email protected] <mailto:[email protected]>>; Mark Adams 
>> <[email protected] <mailto:[email protected]>>; [email protected] 
>> <mailto:[email protected]> <[email protected] 
>> <mailto:[email protected]>>
>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>  
>> On Fri, Jul 14, 2023 at 7:57 PM Ng, Cho-Kuen <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I managed to pass the following options to PETSc using a GPU node on 
>> Perlmutter.
>> 
>>     -mat_type aijcusparse -vec_type cuda -log_view -options_left
>> 
>> Below is a summary of the test using 4 MPI tasks and 1 GPU per task.
>> 
>> o #PETSc Option Table entries:
>>    -log_view
>>    -mat_type aijcusparse
>>    -options_left
>>    -vec_type cuda
>>    #End of PETSc Option Table entries
>>    WARNING! There are options you set that were not used!
>>    WARNING! could be spelling mistake, etc!
>>    There is one unused database option. It is:
>>    Option left: name:-mat_type value: aijcusparse
>> 
>> The -mat_type option has not been used. In the application code, we use
>> 
>>     ierr = MatCreateAIJ(PETSC_COMM_WORLD,mlocal,mlocal,m,n,
>>              d_nz,PETSC_NULL,o_nz,PETSC_NULL,&A);;CHKERRQ(ierr);
>> 
>> 
>> If you create the Mat this way, then you need MatSetFromOptions() in order 
>> to set the type from the command line.
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> o The percent flops on the GPU for KSPSolve is 17%.
>> 
>> In comparison with a CPU run using 16 MPI tasks, the GPU run is an order of 
>> magnitude slower. How can I improve the GPU performance?
>> 
>> Thanks,
>> Cho
>> From: Ng, Cho-Kuen <[email protected] <mailto:[email protected]>>
>> Sent: Friday, June 30, 2023 7:57 AM
>> To: Barry Smith <[email protected] <mailto:[email protected]>>; Mark Adams 
>> <[email protected] <mailto:[email protected]>>
>> Cc: Matthew Knepley <[email protected] <mailto:[email protected]>>; 
>> [email protected] <mailto:[email protected]> 
>> <[email protected] <mailto:[email protected]>>
>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>  
>> Barry, Mark and Matt,
>> 
>> Thank you all for the suggestions. I will modify the code so we can pass 
>> runtime options.
>> 
>> Cho
>> From: Barry Smith <[email protected] <mailto:[email protected]>>
>> Sent: Friday, June 30, 2023 7:01 AM
>> To: Mark Adams <[email protected] <mailto:[email protected]>>
>> Cc: Matthew Knepley <[email protected] <mailto:[email protected]>>; Ng, 
>> Cho-Kuen <[email protected] <mailto:[email protected]>>; 
>> [email protected] <mailto:[email protected]> 
>> <[email protected] <mailto:[email protected]>>
>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>  
>> 
>>   Note that options like -mat_type aijcusparse  -vec_type cuda only work if 
>> the program is set up to allow runtime swapping of matrix and vector types. 
>> If you have a call to MatCreateMPIAIJ() or other specific types then then 
>> these options do nothing but because Mark had you use -options_left the 
>> program will tell you at the end that it did not use the option so you will 
>> know.
>> 
>>> On Jun 30, 2023, at 9:30 AM, Mark Adams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> PetscCall(PetscInitialize(&argc, &argv, NULL, help)); gives us the args and 
>>> you run:
>>> 
>>> a.out -mat_type aijcusparse -vec_type cuda -log_view -options_left
>>> 
>>> Mark
>>> 
>>> On Fri, Jun 30, 2023 at 6:16 AM Matthew Knepley <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> On Fri, Jun 30, 2023 at 1:13 AM Ng, Cho-Kuen via petsc-users 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> Mark,
>>> 
>>> The application code reads in parameters from an input file, where we can 
>>> put the PETSc runtime options. Then we pass the options to 
>>> PetscInitialize(...). Does that sounds right?
>>> 
>>> PETSc will read command line argument automatically in PetscInitialize() 
>>> unless you shut it off.
>>> 
>>>   Thanks,
>>> 
>>>     Matt
>>>  
>>> Cho
>>> From: Ng, Cho-Kuen <[email protected] <mailto:[email protected]>>
>>> Sent: Thursday, June 29, 2023 8:32 PM
>>> To: Mark Adams <[email protected] <mailto:[email protected]>>
>>> Cc: [email protected] <mailto:[email protected]> 
>>> <[email protected] <mailto:[email protected]>>
>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>  
>>> Mark,
>>> 
>>> Thanks for the information. How do I put the runtime options for the 
>>> executable, say, a.out, which does not have the provision to append 
>>> arguments? Do I need to change the C++ main to read in the options?
>>> 
>>> Cho
>>> From: Mark Adams <[email protected] <mailto:[email protected]>>
>>> Sent: Thursday, June 29, 2023 5:55 PM
>>> To: Ng, Cho-Kuen <[email protected] <mailto:[email protected]>>
>>> Cc: [email protected] <mailto:[email protected]> 
>>> <[email protected] <mailto:[email protected]>>
>>> Subject: Re: [petsc-users] Using PETSc GPU backend
>>>  
>>> Run with options: -mat_type aijcusparse -vec_type cuda -log_view 
>>> -options_left
>>> 
>>> The last column of the performance data (from -log_view) will be the 
>>> percent flops on the GPU. Check that that is > 0.
>>> 
>>> The end of the output will list the options that were used and options that 
>>> were _not_ used (if any). Check that there are no options left.
>>> 
>>> Mark
>>> 
>>> On Thu, Jun 29, 2023 at 7:50 PM Ng, Cho-Kuen via petsc-users 
>>> <[email protected] <mailto:[email protected]>> wrote:
>>> I installed PETSc on Perlmutter using "spack install petsc+cuda+zoltan" and 
>>> used it by "spack load petsc/fwge6pf". Then I compiled the application code 
>>> (purely CPU code) linking to the petsc package, hoping that I can get 
>>> performance improvement using the petsc GPU backend. However, the timing 
>>> was the same using the same number of MPI tasks with and without GPU 
>>> accelerators. Have I missed something in the process, for example, setting 
>>> up PETSc options at runtime to use the GPU backend?
>>> 
>>> Thanks,
>>> Cho
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Reply via email to