On Mon, 27 Nov 2023 15:00:30 GMT, mintykat <d...@openjdk.org> wrote:

>> Hi,
>> 
>> I did open the bug report. Some notes to this PR:
>> 
>> My colleagues and I are able to reproduce this bug regularly, even though it 
>> takes sometimes up to 3 or 4 weeks until the D3DERR_DEVICEHUNG error shows 
>> up. We are currently evaluating two versions of fixes, but until now we do 
>> not have any results. I will post them as soon as I got them.
>> 
>> Version 1 (this version): Based on the observation, that the 
>> TestCooperativeLevel/CheckDeviceState method returns D3D_OK again after 
>> about 20 - 60 seconds, the reinitialize is called after the first time the 
>> state is returning D3D_OK. The 'isHung' flag stores the information until 
>> then.
>> 
>> Version 2: calls reinitialize directly after D3DERR_DEVICEHUNG has been 
>> returned. Basically
>> if (hr == D3DERR_DEVICEREMOVED || hr == D3DERR_DEVICEHUNG  ) { .. }
>> 
>> I did not modify the validatePresent method, as for our workaround (see 
>> ticket) it was not necessary. At least the native call swapchain->present 
>> dows not return that error code 
>> (https://learn.microsoft.com/en-us/windows/win32/api/d3d9/nf-d3d9-idirect3dswapchain9-present).
>>  I did not look decisively into all the native calls behind 
>> D3DRTTexture#readPixels.
>> 
>> As I said I will post the results (prism.verbose output) for the 2 versions 
>> later as a base for discussions.
>
> I have put this in D3DContext.java (as per customer suggestion). Just 
> wondering if I should just reinitialize directly and not wait loop: in 
> testLostStateAndReset in D3DContext.java (D3DERR_DEVICEREMOVED is handled 
> further down)
>        if (hr == D3DERR_DEVICEHUNG) {
>             setLost();
> 
>             long retryMillis = TimeUnit.MINUTES.toMillis(5);
>             long sleepMillis = TimeUnit.SECONDS.toMillis(1);
>             //Is this loop necessary?
>             for (int i = 0; i < retryMillis; i += sleepMillis) {
>                 int cooperativeLevel = 
> D3DResourceFactory.nTestCooperativeLevel(pContext);
>                 System.err.println("Checking Cooperative Level: " + 
> cooperativeLevel);
> 
>                 if (cooperativeLevel == D3D_OK) {
>                     break;
>                 } else {
>                     try {
>                         Thread.sleep(sleepMillis);
>                     } catch (InterruptedException e) {
>                         e.printStackTrace();
>                     }
>                 }
>             }
> 
>             // Reinitialize after 5 mins anyway, even if result is not OK.
> 
>             // Reinitialize the D3DPipeline. This will dispose and recreate
>             // the resource factory and context for each adapt
>             D3DPipeline.getInstance().reinitialize();
>             LOGGER.warn("Reinit after graphics hang.");
>         }

Hello @mintykat , the loop is not necessary, in fact it is not recommended as 
it makes the whole application (window) unresponsive due to the Thread.sleep. 
It was just an approach to generate some debug output in order to see how the 
system and D3D is responding after the failure happens. Best is to call 
reinitialize directly after the check for the D3DERR_DEVICEHUNG error code.

I'm wondering a little bit, if JavaFX may be somehow responsible for the crash 
or if its just old drivers? This fix is going to keep the application running, 
but those crashes will still happen. You said, that you can reproduce this 
error every couple of hours? If you have the capacity, maybe you can track down 
the root cause? But I guess it is not a trivial task, I just [found here that 
some 
debug.dlls](https://learn.microsoft.com/en-us/windows/win32/direct3d9/troubleshooting#debugging)
 are needed. We had this error 'only' every few days (sometimes weeks), and I 
wasn't quite motivated for such a long winding deep-dive bug hunt :)

-------------

PR Comment: https://git.openjdk.org/jfx/pull/1199#issuecomment-1828503142

Reply via email to