Since we cannot post issues (reported here
https://forum.gitlab.com/t/creating-new-issue-gives-cannot-create-issue-getting-whoops-something-went-wrong-on-our-end/41966?u=bsmith)
here is my issue so I don't forget it.
I think
err = WaitForCUDA();CHKERRCUDA(err);
ierr = PetscLogGpuTimeEnd();CHKERRQ(ierr);
should be changed to include WaitForCUDA() actually WaitForDevice()
inside the PetscLogGpuTimeEnd().
Currently sometimes the WaitForCUDA() is missing in a few places
resulting in bad timing.
Also some _SeqCUDA() don't have the PetscLogGpuTimeEnd() and need
to be fixed.
The current model is a maintenance nightmare.
Does anyone see a problem with making this change?
I'm fine with this change, as the maintenance benefits outweigh the
performance cost for typical use cases.
I propose to also add the WaitForDevice(); at
PetscLogGpuTimeBegin(). This will ensure that no previous GPU kernel
executions spill over into the timed section.
Karl,
When synchronization is turned on the precious GPU kernels should
always have their own WaitForDevice(), so are you concerned about buggy
code that does not include WaitForDevice?
I'm primarily thinking of user callback routines here. For example, a
FormFunction provided by the user that is running some GPU kernels. We
have no guarantee that these user kernels have completed before entering
the timed sections inside PETSc, so the logs will be skewed to report an
unusually slow kernel in PETSc (the one right after the user form
function). Arguably we could add a WaitForDevice() after user callback
invocations.
I didn't think of the WaitForDevice() after each kernel call in PETSc;
with that we do get reasonable timings within PETSc (except for the user
callbacks mentioned above), so the two-barrier model is not needed.
Best regards,
Karli
Might this incur an extra overhead checking the device? Or will it
always be true that if there are no outstanding kernels it will not
go to the GPU and the check will return immediately?
If we want to have a two barrier model, I propose we log the timing
for waiting at the first barrier separately.
Barry
Best regards,
Karli