Am 11.04.25 um 10:38 schrieb Boris Brezillon: > On Fri, 11 Apr 2025 10:04:07 +0200 > Christian König <christian.koe...@amd.com> wrote: > >> Am 10.04.25 um 20:41 schrieb Boris Brezillon: >>> On Thu, 10 Apr 2025 14:01:03 -0400 >>> Alyssa Rosenzweig <aly...@rosenzweig.io> wrote: >>> >>>>>>> In Panfrost and Lima, we don't have this concept of "incremental >>>>>>> rendering", so when we fail the allocation, we just fail the >>>>>>> GPU job with an unhandled GPU fault. >>>>>> To be honest I think that this is enough to mark those two >>>>>> drivers as broken. It's documented that this approach is a >>>>>> no-go for upstream drivers. >>>>>> >>>>>> How widely is that used? >>>>> It exists in lima and panfrost, and I wouldn't be surprised if a >>>>> similar mechanism was used in other drivers for tiler-based GPUs >>>>> (etnaviv, freedreno, powervr, ...), because ultimately that's how >>>>> tilers work: the amount of memory needed to store per-tile >>>>> primitives (and metadata) depends on what the geometry pipeline >>>>> feeds the tiler with, and that can't be predicted. If you >>>>> over-provision, that's memory the system won't be able to use >>>>> while rendering takes place, even though only a small portion >>>>> might actually be used by the GPU. If your allocation is too >>>>> small, it will either trigger a GPU fault (for HW not supporting >>>>> an "incremental rendering" mode) or under-perform (because >>>>> flushing primitives has a huge cost on tilers). >>>> Yes and no. >>>> >>>> Although we can't allocate more memory for /this/ frame, we know >>>> the required size is probably constant across its lifetime. That >>>> gives a simple heuristic to manage the tiler heap efficiently >>>> without allocations - even fallible ones - in the fence signal >>>> path: >>>> >>>> * Start with a small fixed size tiler heap >>>> * Try to render, let incremental rendering kick in when it's too >>>> small. >>>> * When cleaning up the job, check if we used incremental rendering. >>>> * If we did - double the size of the heap the next time we submit >>>> work. >>>> >>>> The tiler heap still grows dynamically - it just does so over the >>>> span of a couple frames. In practice that means a tiny hit to >>>> startup time as we dynamically figure out the right size, >>>> incurring extra flushing at the start, without needing any >>>> "grow-on-page-fault" heroics. >>>> >>>> This should solve the problem completely for CSF/panthor. So it's >>>> only hardware that architecturally cannot do incremental rendering >>>> (older Mali: panfrost/lima) where we need this mess. >>> OTOH, if we need something >>> for Utgard(Lima)/Midgard/Bifrost/Valhall(Panfrost), why not use the >>> same thing for CSF, since CSF is arguably the sanest of all the HW >>> architectures listed above: allocation can fail/be non-blocking, >>> because there's a fallback to incremental rendering when it fails. >> Yeah that is a rather interesting point Alyssa noted here. >> >> So basically you could as well implement it like this: >> 1. Userspace makes a submission. >> 2. HW finds buffer is not large enough, sets and error code and >> completes submission. 3. Userspace detects error, re-allocates buffer >> with increased size. 4. Userspace re-submits to incremental complete >> the submission. 5. Eventually repeat until fully completed. >> >> That would work but is likely just not the most performant solution. >> So faulting in memory on demand is basically just an optimization and >> that is ok as far as I can see. > Yeah, Alyssa's suggestion got me thinking too, and I think I can come > up with a plan where we try non-blocking allocation first, and if it > fails, we trigger incremental rendering, and queue a blocking > heap-chunk allocation on separate workqueue, such that next time the > tiler heap hits an OOM, it has a chunk (or multiple chunks) readily > available if the blocking allocation completed in the meantime. That's > basically what Alyssa suggested, with an optimization if the system is > not under memory pressure, and without userspace being involved (so no > uAPI changes).
That sounds like it most likely won't work. In an OOM situation the blocking allocation would just cause more pressure to complete your rendering to free up memory. > I guess this leaves older GPUs that don't support incremental rendering > in a bad place though. Well what's the handling there currently? Just crash when you're OOM? Regards, Christian. > >> That is then a rather good justification for your work Boris. Because >> a common component makes it possible to implement a common fault >> injection functionality to make sure that the fallback path is >> properly exercised in testing. > I can also add an fault injection mechanism to validate that, yep. > > Thanks, > > Boris