Hi Thomas, On 2023/1/12 9:51 PM, Thomas Schwinge wrote: > In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with > 'res != CUDA_SUCCESS' ("an illegal memory access was encountered"). > When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device > (..., which deadlocks); that's generally problematic: per > https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g613d97a277d7640f4cb1c03bd51c2483 > "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls".
I remember running into this myself when first creating this async support (IIRC in my case it was cuFree()-ing something) yet you've found another mistake here! :) > Given that eventually we must reach a host/device synchronization point > (latest when the device is shut down at program termination), and the > non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to > replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the > "nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case" > attached. OK to push? I think this patch is fine. Actual approval powers are your's or Tom's :) > > (Might we even skip 'GOMP_PLUGIN_error' here, understanding that the > error will be caught and reported at the next host/device synchronization > point? But I've not verified that.) Actually, the CUDA driver API docs are a bit vague on what exactly this CUresult arg to the callback actually means. The 'res != CUDA_SUCCESS' handling here was basically just generic handling. I am not really sure what is the true right thing to do here (is the error still retained by CUDA after the callback completes?) Chung-Lin