Hi! Ping.
On Wed, 16 Dec 2015 13:30:21 +0100, I wrote: > On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iver...@gmail.com> wrote: > > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote: > > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote: > > > > +/* This function finalizes all initialized devices. */ > > > > + > > > > +static void > > > > +gomp_target_fini (void) > > > > +{ > > > > + [...] > > > > > > The question is what will this do if there are async target tasks still > > > running on some of the devices at this point (forgotten #pragma omp > > > taskwait > > > or similar if target nowait regions are started outside of parallel > > > region, > > > or exit inside of parallel, etc. But perhaps it can be handled > > > incrementally. > > > Also there is the question that the > > > So I think the patch is ok with the above mentioned changes. > > > > Here is what I've committed to trunk. > > > --- a/libgomp/libgomp.h > > +++ b/libgomp/libgomp.h > > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t > > } cuda; > > } acc_dispatch_t; > > > > +/* Various state of the accelerator device. */ > > +enum gomp_device_state > > +{ > > + GOMP_DEVICE_UNINITIALIZED, > > + GOMP_DEVICE_INITIALIZED, > > + GOMP_DEVICE_FINALIZED > > +}; > > + > > /* This structure describes accelerator device. > > It contains name of the corresponding libgomp plugin, function handlers > > for > > interaction with the device, ID-number of the device, and information > > about > > @@ -933,8 +941,10 @@ struct gomp_device_descr > > /* Mutex for the mutable data. */ > > gomp_mutex_t lock; > > > > - /* Set to true when device is initialized. */ > > - bool is_initialized; > > + /* Current state of the device. OpenACC allows to move from INITIALIZED > > state > > + back to UNINITIALIZED state. OpenMP allows only to move from > > INITIALIZED > > + to FINALIZED state (at program shutdown). */ > > + enum gomp_device_state state; > > (ACK, but I assume we'll want to make sure that an OpenACC device is > never re-initialized if we're in/after the libgomp finalization phase.) > > > The issue mentioned above: "exit inside of parallel" is actually a > problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c, > libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90 > test cases now run into annoying "WARNING: program timed out". Here is > what's happening, as I understand it: in > libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call > returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal. > > > --- a/libgomp/target.c > > +++ b/libgomp/target.c > > > +/* This function finalizes all initialized devices. */ > > + > > +static void > > +gomp_target_fini (void) > > +{ > > + int i; > > + for (i = 0; i < num_devices; i++) > > + { > > + struct gomp_device_descr *devicep = &devices[i]; > > + gomp_mutex_lock (&devicep->lock); > > + if (devicep->state == GOMP_DEVICE_INITIALIZED) > > + { > > + devicep->fini_device_func (devicep->target_id); > > + devicep->state = GOMP_DEVICE_FINALIZED; > > + } > > + gomp_mutex_unlock (&devicep->lock); > > + } > > +} > > > @@ -2387,6 +2433,9 @@ gomp_target_init (void) > > if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200) > > goacc_register (&devices[i]); > > } > > + > > + if (atexit (gomp_target_fini) != 0) > > + gomp_fatal ("atexit failed"); > > } > > Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the > atexit handler, gomp_target_fini, which, with the device lock held, will > call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to > clean up. > > Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA > context is now in an inconsistent state, see > <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>: > > CUDA_ERROR_LAUNCH_FAILED = 719 > An exception occurred on the device while executing a > kernel. Common causes include dereferencing an invalid device > pointer and accessing out of bounds shared memory. The context > cannot be used, so it must be destroyed (and a new one should be > created). All existing device memory allocations from this > context are invalid and must be reconstructed if the program is > to continue using CUDA. > > Thus, any cuMemFreeHost invocations that are run during clean-up will now > also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call > GOMP_PLUGIN_fatal, which again will trigger the same or another > (GOMP_offload_unregister_ver) atexit handler, which will then deadlock > trying to lock the device again, which is still locked. > > (Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar > errors, should we destroy the context right away, or toggle a boolean > flag to mark it as unusable, and use that as an indication to avoid the > follow-on failures of cuMemFreeHost just described above, for example?) > > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html> > tells us: > > Since the behavior is undefined if the exit() function is called more > than once, portable applications calling atexit() must ensure that the > exit() function is not called at normal process termination when all > functions registered by the atexit() function are called. > > ... which we're violating here, at least in the nvptx plugin. I have not > analyzed the intermic one. > > As it happens, Chung-Lin has been working in that area: > <http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>, > which he recently re-posted: > <http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>, > <http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>, > <http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>. > I have not analyzed whether his changes would completely resolve the > problem just described, but at least conceptually they seem to be a step > into the right direction? (Jakub?) > > Now, to resolve the immediate problem, what is the right thing for us to > do? Is the following simple change OK, or is there a reason to still run > atexit handlers if terminating under error conditions? > > commit b1733e8f9df6ae7d6828e2194df1b314772701c5 > Author: Thomas Schwinge <tho...@codesourcery.com> > Date: Wed Dec 16 13:10:39 2015 +0100 > > Avoid deadlocks in libgomp due to competing atexit handlers > > libgomp/ > * error.c (gomp_vfatal): Call _exit instead of exit. > --- > libgomp/error.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git libgomp/error.c libgomp/error.c > index 094c24a..1ef7491 100644 > --- libgomp/error.c > +++ libgomp/error.c > @@ -34,6 +34,7 @@ > #include <stdarg.h> > #include <stdio.h> > #include <stdlib.h> > +#include <unistd.h> > > > #undef gomp_vdebug > @@ -77,7 +78,7 @@ void > gomp_vfatal (const char *fmt, va_list list) > { > gomp_verror (fmt, list); > - exit (EXIT_FAILURE); > + _exit (EXIT_FAILURE); > } > > void Grüße Thomas
signature.asc
Description: PGP signature