On 7/25/23 04:55, André Almeida wrote:
> Hi everyone,
> 
> It's not clear what we should do about non-robust OpenGL apps after GPU 
> resets, so I'll try to summarize the topic, show some options and my proposal 
> to move forward on that.
> 
> Em 27/06/2023 10:23, André Almeida escreveu:
>> +Robustness
>> +----------
>> +
>> +The only way to try to keep an application working after a reset is if it
>> +complies with the robustness aspects of the graphical API that it is using.
>> +
>> +Graphical APIs provide ways to applications to deal with device resets. 
>> However,
>> +there is no guarantee that the app will use such features correctly, and the
>> +UMD can implement policies to close the app if it is a repeating offender,
>> +likely in a broken loop. This is done to ensure that it does not keep 
>> blocking
>> +the user interface from being correctly displayed. This should be done even 
>> if
>> +the app is correct but happens to trigger some bug in the hardware/driver.
>> +
> Depending on the OpenGL version, there are different robustness API available:
> 
> - OpenGL ABR extension [0]
> - OpenGL KHR extension [1]
> - OpenGL ES extension  [2]
> 
> Apps written in OpenGL should use whatever version is available for them to 
> make the app robust for GPU resets. That usually means calling 
> GetGraphicsResetStatusARB(), checking the status, and if it encounter 
> something different from NO_ERROR, that means that a reset has happened, the 
> context is considered lost and should be recreated. If an app follow this, it 
> will likely succeed recovering a reset.
> 
> What should non-robustness apps do then? They certainly will not be notified 
> if a reset happens, and thus can't recover if their context is lost. OpenGL 
> specification does not explicitly define what should be done in such 
> situations[3], and I believe that usually when the spec mandates to close the 
> app, it would explicitly note it.
> 
> However, in reality there are different types of device resets, causing 
> different results. A reset can be precise enough to damage only the guilty 
> context, and keep others alive.
> 
> Given that, I believe drivers have the following options:
> 
> a) Kill all non-robust apps after a reset. This may lead to lose work from 
> innocent applications.
> 
> b) Ignore all non-robust apps OpenGL calls. That means that applications 
> would still be alive, but the user interface would be freeze. The user would 
> need to close it manually anyway, but in some corner cases, the app could 
> autosave some work or the user might be able to interact with it using some 
> alternative method (command line?).
> 
> c) Kill just the affected non-robust applications. To do that, the driver 
> need to be 100% sure on the impact of its resets.
> 
> RadeonSI currently implements a), as can be seen at [4], while Iris 
> implements what I think it's c)[5].
> 
> For the user experience point-of-view, c) is clearly the best option, but 
> it's the hardest to archive. There's not much gain on having b) over a), 
> perhaps it could be an optional env var for such corner case applications.

I disagree on these conclusions.

c) is certainly better than a), but it's not "clearly the best" in all cases. 
The OpenGL UMD is not a privileged/special component and is in no position to 
decide whether or not the process as a whole (only some thread(s) of which may 
use OpenGL at all) gets to continue running or not.


> [0] https://registry.khronos.org/OpenGL/extensions/ARB/ARB_robustness.txt
> [1] https://registry.khronos.org/OpenGL/extensions/KHR/KHR_robustness.txt
> [2] https://registry.khronos.org/OpenGL/extensions/EXT/EXT_robustness.txt
> [3] https://registry.khronos.org/OpenGL/specs/gl/glspec46.core.pdf
> [4] 
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/gallium/winsys/amdgpu/drm/amdgpu_cs.c#L1657
> [5] 
> https://gitlab.freedesktop.org/mesa/mesa/-/blob/23.1/src/gallium/drivers/iris/iris_batch.c#L842

-- 
Earthling Michel Dänzer            |                  https://redhat.com
Libre software enthusiast          |         Mesa and Xwayland developer

Reply via email to