On Thu, Aug 27, 2015 at 21:44:50 +0800, Chung-Lin Tang wrote: > We've discovered that, for several of the libgomp plugin interface routines, > if the target specific routine calls exit() (usually upon a fatal condition), > deadlock ensues. We found this using nvptx, but it's possible on intelmic as > well. > > This is due to many of the plugin routines are called with the device lock > held, > and when exit() is called inside the plugin code, the GOMP_unregister_var() > destructor > tries to iterate through and acquire all device locks to cleanup. Since we > already hold > one of the device locks, this just gets stuck. Also because gomp_mutex_t is a > simple futex based lock implementation (instead of pthreads), we don't have a > trylock mechanism to use either. > > So this patch tries to alleviate this problem by changing the plugin > interface; > the plugin routines that are called while holding the device lock are adjusted > to assume to never fatal exit, but return a value back to libgomp proper to > indicate execution results. The core libgomp code then may unlock and call > gomp_fatal(). > > We believe this is the right route to solve the problem, since there's only > two accel target plugins so far. Besides the nvptx plugin, I have made some > effort > to update the intelmic plugin as well, though it's not as thoroughly audited. > Intel folks might want to further make sure your plugin code is free of this > problem as well. > > This patch contains the libgomp proper changes. The nvptx and intelmic > patches follow. > I have tested the libgomp testsuite without regressions for both accel > targets, is this > okay for trunk?
(I have no objections) However, in case of intelmic, these exit()s are just the tip of the iceberg, because underlying liboffloadmic contains other exit()s at fatal errors. And I don't know what to do with such deadlocks. -- Ilya