Thank you, Richard and Junchao. This is very helpful info. I’ll try to step 
through using debugger as you’ve suggested.

Shri
From: petsc-dev <[email protected]> on behalf of PETSc Development 
<[email protected]>
Reply-To: Richard Tran Mills <[email protected]>
Date: Tuesday, March 10, 2020 at 11:34 PM
To: PETSc Development <[email protected]>
Subject: Re: [petsc-dev] Understanding -log_summary with GPUs


Hi Shri,

Probably the best way to understand what is going on is to step through things 
using a debugger, as Junchao suggests. VecAXPY does get used in a lot of 
places, and maybe it is being called on some vectors that aren't getting their 
type from the options database? Also, there are several places where a vector 
gets "bound" to execute operations on the CPU instead of the GPU (see 
VecBindToCPU()) either because we know that the vector isn't going to be needed 
on the CPU for subsequent operations, or because the size of the vector is too 
small for it to make sense to do on the GPU because of kernel launch latency. 
When a vector is bound to the CPU, operations with it will be counted in the 
CPU MFlops column.

It looks like you are actually getting decent GPU usage for your vector 
operations. While VecAXPY is showing only 80% of operations on the GPU, it's 
also accounting for less than one percent of the total flops. I see 100% GPU 
flops for the VecMAXPY that accounts for 13% of your flops.

Best regards,
Richard
On 3/10/20 3:44 PM, Junchao Zhang via petsc-dev wrote:
Hi, Shri,
  I don't understand either. But there are many invocations of VecAXPY etc. Is 
it possible some are done on CPU? Attach a debugger and set a breakpoint on 
VecAXPY_SeqCUDA to see if it gets a hit. If yes, then see why.

--Junchao Zhang


On Tue, Mar 10, 2020 at 2:44 PM Abhyankar, Shrirang G via petsc-dev 
<[email protected]<mailto:[email protected]>> wrote:
Hello all,
  I need help in understanding the output from -log_summary for the GPU related 
columns. I am currently simply setting -vec_type seqcuda which I believe 
performs the vector operations on the GPU. With -vec_type seqcuda, I presumed 
all vector operations are being done on the GPU. So, only the GPU MFlops will 
be logged, and CPU MFlops will be zero. But, -log_summary reports Mflops for 
both CPU and GPU. I do not understand why Mflops are shown both for CPU and GPU?

What is the meaning of the last column – percent flops on the GPU? For 
instance, some operations such as VecDot show 100 %F, while others like VecAXPY 
have less. What is the meaning of this?

Any other general comments on these numbers?

Let me know if you need more information.

Thanks,
Shri

Reply via email to