Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

Karl Rupp Fri, 28 Aug 2020 22:17:53 -0700

Since we cannot post issues (reported herehttps://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith)here is my issue so I don't forget it.
 I think
err  = WaitForCUDA();CHKERRCUDA(err);
ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
should be changed to include WaitForCUDA() actually WaitForDevice()inside the PetscLogGpuTimeEnd().Currently sometimes the WaitForCUDA() is missing in a few placesresulting in bad timing.Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and needto be fixed.
The current model is a maintenance nightmare.
Does anyone see a problem with making this change?
I'm fine with this change, as the maintenance benefits outweigh theperformance cost for typical use cases.
I propose to also add the WaitForDevice(); atPetscLogGpuTimeBegin(). This will ensure that no previous GPU kernelexecutions spill over into the timed section.
   Karl,
When synchronization is turned on the precious GPU kernels shouldalways have their own WaitForDevice(), so are you concerned about buggycode that does not include WaitForDevice?

I'm primarily thinking of user callback routines here. For example, aFormFunction provided by the user that is running some GPU kernels. Wehave no guarantee that these user kernels have completed before enteringthe timed sections inside PETSc, so the logs will be skewed to report anunusually slow kernel in PETSc (the one right after the user formfunction). Arguably we could add a WaitForDevice() after user callbackinvocations.

I didn't think of the WaitForDevice() after each kernel call in PETSc;with that we do get reasonable timings within PETSc (except for the usercallbacks mentioned above), so the two-barrier model is not needed.


Best regards,
Karli

Might this incur an extra overhead checking the device? Or will italways be true that if there are no outstanding kernels it will notgo to the GPU and the check will return immediately?
If we want to have a two barrier model, I propose we log the timingfor waiting at the first barrier separately.
Barry
Best regards,
Karli

Re: [petsc-dev] PETSc issue I cannot post combine WaitForCUDA(); inside PetscLogGpuTimeEnd();

Reply via email to