Hi, Pitching in to describe the situation for v3d:
El vie, 18-04-2025 a las 14:25 +0200, Boris Brezillon escribió: (...) > +For all these reasons, the tiler usually allocates memory > dynamically, but > +DRM has not been designed with this use case in mind. Drivers will > address > +these problems differently based on the functionality provided by > their > +hardware, but all of them almost certainly have to deal with this > somehow. > + > +The easy solution is to statically allocate a huge buffer to pick > from when > +tiler memory is needed, and fail the rendering when this buffer is > depleted. > +Some drivers try to be smarter to avoid reserving a lot of memory > upfront. > +Instead, they start with an almost empty buffer and progressively > populate it > +when the GPU faults on an address sitting in the tiler buffer range. > This > +works okay most of the time but it falls short when the system is > under > +memory pressure, because the memory request is not guaranteed to be > satisfied. > +In that case, the driver either fails the rendering, or, if the > hardware > +allows it, it tries to flush the primitives that have been processed > and > +triggers a fragment job that will consume those primitives and free > up some > +memory to be recycled and make further progress on the tiling step. > This is > +usually referred as partial/incremental rendering (it might have > other names). In our case, user space allocates some memory up front hoping to avoid running out of memory during tiling, but if the tiler does run out of memory we get an interrupt and the tiler hw will stop and wait for the kernel driver to write back an address where more memory is made available (via register write), which we will try to allocate at that point. This can happen any number of times until the tiler job completes I am not sure that we are handling allocation failure on this path nicely at the moment since we don't try to fail and cancel the job, that's maybe something we should fix, although I don't personally recall any reports of us running into this situation either. > + > +Compute based emulation of geometry stages > +------------------------------------------ > + > +More and more hardware vendors don't bother providing hardware > support for > +geometry/tesselation/mesh stages, since those can be emulated with > compute > +shaders. But the same problem we have with tiler memory exists with > those > +intermediate compute-emulated stages, because transient data shared > between > +stages need to be stored in memory for the next stage to consume, > and this > +bubbles up until the tiling stage is reached, because ultimately, > what the > +tiling stage will need to process is a set of vertices it can turn > into > +primitives, like would happen if the application had emulated the > geometry, > +tesselation or mesh stages with compute. > + > +Unlike tiling, where the hardware can provide a fallback to recycle > memory, > +there is no way the intermediate primitives can be flushed up to the > framebuffer, > +because it's a purely software emulation here. This being said, the > same > +"start small, grow on-demand" can be applied to avoid over- > allocating memory > +upfront. FWIW, v3d has geometry and tessellation hardware. > + > +On-demand memory allocation > +--------------------------- > + > +As explained in previous sections, on-demand allocation is a central > piece > +of tile-based renderer if we don't want to over-allocate, which is > bad for > +integrated GPUs who share their memory with the rest of the system. > + > +The problem with on-demand allocation is that suddenly, GPU accesses > can > +fail on OOM, and the DRM components (drm_gpu_scheduler and drm_gem > mostly) > +were not designed for that. Those are assuming that buffers memory > is > +populated at job submission time, and will stay around for the job > lifetime. > +If a GPU fault happens, it's the user fault, and the context can be > flagged > +unusable. On-demand allocation is usually implemented as allocation- > on-fault, > +and the dma_fence contract prevents us from blocking on allocations > in that > +path (GPU fault handlers are in the dma-fence signalling path). As I described above, v3d is not quite an allocation-on-fault mechanism but rather, we get a dedicated interrupt from the hw when it needs more memory, which I believe happens a bit before it completely runs out of memory actually. Maybe that changes the picture since we don't exactly use a fault handler? Iago