That's a terrible idea. Ignoring API calls would be identical to a freeze. You might as well disable GPU recovery because the result would be the same.
There are 2 scenarios: - robust contexts: report the GPU reset status and skip API calls; let the app recreate the context to recover - non-robust contexts: call exit(1) immediately, which is the best way to recover Marek On Fri, Jun 30, 2023 at 11:11 AM Michel Dänzer <michel.daen...@mailbox.org> wrote: > On 6/30/23 16:59, Alex Deucher wrote: > > On Fri, Jun 30, 2023 at 10:49 AM Sebastian Wick > > <sebastian.w...@redhat.com> wrote: > >> On Tue, Jun 27, 2023 at 3:23 PM André Almeida <andrealm...@igalia.com> > wrote: > >>> > >>> +Robustness > >>> +---------- > >>> + > >>> +The only way to try to keep an application working after a reset is > if it > >>> +complies with the robustness aspects of the graphical API that it is > using. > >>> + > >>> +Graphical APIs provide ways to applications to deal with device > resets. However, > >>> +there is no guarantee that the app will use such features correctly, > and the > >>> +UMD can implement policies to close the app if it is a repeating > offender, > >>> +likely in a broken loop. This is done to ensure that it does not keep > blocking > >>> +the user interface from being correctly displayed. This should be > done even if > >>> +the app is correct but happens to trigger some bug in the > hardware/driver. > >> > >> I still don't think it's good to let the kernel arbitrarily kill > >> processes that it thinks are not well-behaved based on some heuristics > >> and policy. > >> > >> Can't this be outsourced to user space? Expose the information about > >> processes causing a device and let e.g. systemd deal with coming up > >> with a policy and with killing stuff. > > > > I don't think it's the kernel doing the killing, it would be the UMD. > > E.g., if the app is guilty and doesn't support robustness the UMD can > > just call exit(). > > It would be safer to just ignore API calls[0], similarly to what is done > until the application destroys the context with robustness. Calling exit() > likely results in losing any unsaved work, whereas at least some > applications might otherwise allow saving the work by other means. > > > [0] Possibly accompanied by a one-time message to stderr along the lines > of "GPU reset detected but robustness not enabled in context, ignoring > OpenGL API calls". > > -- > Earthling Michel Dänzer | https://redhat.com > Libre software enthusiast | Mesa and Xwayland developer > >