Thank you, Richard and Junchao. This is very helpful info. I’ll try to step through using debugger as you’ve suggested.
Shri From: petsc-dev <[email protected]> on behalf of PETSc Development <[email protected]> Reply-To: Richard Tran Mills <[email protected]> Date: Tuesday, March 10, 2020 at 11:34 PM To: PETSc Development <[email protected]> Subject: Re: [petsc-dev] Understanding -log_summary with GPUs Hi Shri, Probably the best way to understand what is going on is to step through things using a debugger, as Junchao suggests. VecAXPY does get used in a lot of places, and maybe it is being called on some vectors that aren't getting their type from the options database? Also, there are several places where a vector gets "bound" to execute operations on the CPU instead of the GPU (see VecBindToCPU()) either because we know that the vector isn't going to be needed on the CPU for subsequent operations, or because the size of the vector is too small for it to make sense to do on the GPU because of kernel launch latency. When a vector is bound to the CPU, operations with it will be counted in the CPU MFlops column. It looks like you are actually getting decent GPU usage for your vector operations. While VecAXPY is showing only 80% of operations on the GPU, it's also accounting for less than one percent of the total flops. I see 100% GPU flops for the VecMAXPY that accounts for 13% of your flops. Best regards, Richard On 3/10/20 3:44 PM, Junchao Zhang via petsc-dev wrote: Hi, Shri, I don't understand either. But there are many invocations of VecAXPY etc. Is it possible some are done on CPU? Attach a debugger and set a breakpoint on VecAXPY_SeqCUDA to see if it gets a hit. If yes, then see why. --Junchao Zhang On Tue, Mar 10, 2020 at 2:44 PM Abhyankar, Shrirang G via petsc-dev <[email protected]<mailto:[email protected]>> wrote: Hello all, I need help in understanding the output from -log_summary for the GPU related columns. I am currently simply setting -vec_type seqcuda which I believe performs the vector operations on the GPU. With -vec_type seqcuda, I presumed all vector operations are being done on the GPU. So, only the GPU MFlops will be logged, and CPU MFlops will be zero. But, -log_summary reports Mflops for both CPU and GPU. I do not understand why Mflops are shown both for CPU and GPU? What is the meaning of the last column – percent flops on the GPU? For instance, some operations such as VecDot show 100 %F, while others like VecAXPY have less. What is the meaning of this? Any other general comments on these numbers? Let me know if you need more information. Thanks, Shri
