On Fri, Aug 26, 2022 at 05:56:09PM +0300, Alexander Monakov via Gcc-patches wrote: > > On Fri, 26 Aug 2022, Tobias Burnus wrote: > > > @Tom and Alexander: Better suggestions are welcome for the busy loop in > > libgomp/plugin/plugin-nvptx.c regarding the variable placement and checking > > its value. > > I think to do that without polling you can use PTX 'brkpt' instruction on the > device and CUDA Debugger API on the host (but you'd have to be careful about > interactions with the real debugger). > > How did the standardization process for this feature look like, how did it > pass > if it's not efficiently implementable for the major offloading targets?
It doesn't have to be implementable on all major offloading targets, it is enough when it can work on some. As one needs to request the reverse offloading through a declarative directive, it is always possible in that case to just pretend devices that don't support it don't exist. But it would be really nice to support it even on PTX. Are there any other implementations of reverse offloading to PTX already? Jakub