Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-29 Thread Felix Kuehling
On 2024-11-28 21:51, Mika Laitio wrote: Thanks for the feedback, the problem is anyway real breaking userspace apps if my patch is not in use. I have actually spend this day for investigating and testing another gpu hang bug that has been reported originally by others on gfx1010/AMD RX 5700.

Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-28 Thread Mika Laitio
Thanks for the feedback, the problem is anyway real breaking userspace apps if my patch is not in use. I have actually spend this day for investigating and testing another gpu hang bug that has been reported originally by others on gfx1010/AMD RX 5700. I thought originally that the bug is different

Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Felix Kuehling
On 2024-11-27 06:51, Christian König wrote: Am 27.11.24 um 12:46 schrieb Mika Laitio: AMD gfx1103 / M780 iGPU will crash eventually when used for pytorch ML/AI operations on rocm sdk stack. After kernel error the application exits on error and linux desktop can itself sometimes either freeze o

Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Christian König
Am 27.11.24 um 12:46 schrieb Mika Laitio: AMD gfx1103 / M780 iGPU will crash eventually when used for pytorch ML/AI operations on rocm sdk stack. After kernel error the application exits on error and linux desktop can itself sometimes either freeze or reset back to login screen. Error will happe