On 14/04/2025 13:47, Boris Brezillon wrote: > On Fri, 11 Apr 2025 16:39:02 +0200 > Boris Brezillon <boris.brezil...@collabora.com> wrote: > >> On Fri, 11 Apr 2025 15:13:26 +0200 >> Christian König <christian.koe...@amd.com> wrote: >> >>>> >>>>> Background is that you don't get a crash, nor error message, nor >>>>> anything indicating what is happening. >>>> The job times out at some point, but we might get stuck in the fault >>>> handler waiting for memory, which is pretty close to a deadlock, I >>>> suspect. >>> >>> I don't know those drivers that well, but at least for amdgpu the >>> problem would be that the timeout handling would need to grab some of >>> the locks the memory management is holding waiting for the timeout >>> handling to do something.... >>> >>> So that basically perfectly closes the circle. With a bit of lock you >>> get a message after some time that the kernel is stuck, but since >>> that are all sleeping locks I strongly doubt so. >>> >>> As immediately action please provide patches which changes those >>> GFP_KERNEL into GFP_NOWAIT. >> >> Sure, I can do that. > > Hm, I might have been too prompt at claiming this was doable. In > practice, doing that might regress Lima and Panfrost in situations > where trying harder than GFP_NOWAIT would free up some memory. Not > saying this was right to use GFP_KERNEL in the first place, but some > expectations were set by this original mistake, so I'll probably need > Lima developers to vouch in for this change after they've done some > testing on a system under high memory pressure, and I'd need to do the > same kind of testing for Panfrost and ask Steve if he's okay with that > too.
It's a tricky one. The ideal would be to teach user space how to recover from the memory allocation failure (even on older GPUs it's still in theory possible to break up the work and do incremental rendering). But that ship sailed long ago so this would be a regression. I did consider GFP_ATOMIC as an option, but really we shouldn't be accessing "memory reserves" in such a situation. For background: In the "kbase" driver (Linux kernel driver for the closed source user space 'DDK' driver for Midgard/Bifrost GPUs) we had a "JIT memory allocator". The idea here was that if you have multiple renderings in flight you can attempt to share the tiler heap memory between them. So in this case when we can't allocate more memory and we know there's another tiler heap which is going to be freed by a fragment job that's already running, we can block knowing the memory is going to become available. It was designed to do the same thing as CSF's incremental rendering - allow us to opportunistically allocate memory but not fail the rendering if it wasn't available. But it was a nightmare to have any confidence of it being deadlock free and the implementation was frankly terrible - which is ultimately why CSF GPU's have this ability in hardware to perform incremental rendering without failing the job. But effectively this approach requires allocating just enough memory for one complete tiler heap while ensuring forward progress and opportunistically allowing extra memory to give a performance boost. TLDR; I think we should try switching to GFP_NOWAIT in Panfrost and do some testing with memory pressure. It might be acceptable (and an occasional job failure is better than an occasional lock up). If it turns out it's too easy to trigger job failures like this then we'll need to rethink. > For Panthor, I'm less worried, because we have the incremental rendering > fallback, and assuming GFP_NOWAIT tries hard enough to reclaim > low-hanging fruits, the perfs shouldn't suffer much more than they > would today with GFP_KERNEL allocations potentially delaying tiling > operations longer than would have been with a primitive flush. Yes for Panthor I think the approach is obvious - I believe GFP_NOWAIT should trigger background reclaim, so over the course of a few frames it should make the memory available (assuming there is sufficient 'free' memory). Indeed it might even give a performance boost if it stops us getting blocked in direct reclaim. Thanks, Steve