Steven wanted non-Mali eyes, so with my Imaginapple hat on...

> +All lot of embedded GPUs are using tile-based rendering instead of immediate

s/All lot of/Many/

> +- Complex geometry pipelines: if you throw geometry/tesselation/mesh shaders
> +  it gets even trickier to guess the number of primitives from the number
> +  of vertices passed to the vertex shader.

Tessellation, yes. Geometry shaders, no. Geometry shaders must declare
the maximum # of vertices they output, so by themselves geometry shaders
don't make the problem much harder - unless you do an indirect draw with
a GS, the emulated GS draw can still be direct.

But I guess "even trickier" is accurate still...

> +For all these reasons, the tiler usually allocates memory dynamically, but
> +DRM has not been designed with this use case in mind. Drivers will address
> +these problems differently based on the functionality provided by their
> +hardware, but all of them almost certainly have to deal with this somehow.
> +
> +The easy solution is to statically allocate a huge buffer to pick from when
> +tiler memory is needed, and fail the rendering when this buffer is depleted.
> +Some drivers try to be smarter to avoid reserving a lot of memory upfront.
> +Instead, they start with an almost empty buffer and progressively populate it
> +when the GPU faults on an address sitting in the tiler buffer range.

This all seems very Mali-centric. Imaginapple has had partial renders
since forever.

> +More and more hardware vendors don't bother providing hardware support for
> +geometry/tesselation/mesh stages

I wouldn't say that... Mali is the only relevant hardware that has *no*
hardware support for any of geom/tess/mesh. All the desktop vendors +
Qualcomm have full hardware support, Apple has hardware mesh on m3+,
Broadcom has geom/tess, and I think Imagination has geom/tess on certain
parts.

And I don't know of any vendors (except possibly Imagination) that
removed hardware support, because it turns out having hardware support
for core API features is a good thing actually. It doesn't need to look
like "put the API in hardware" but some sort of hardware acceleration
(like AMD's NGG) solves the problems in this doc and more.

So... just "Some hardware vendors omit hardware support for
geometry/tessellation/mesh stages".

> This being said, the same +"start small, grow on-demand" can be
> applied to avoid over-allocating memory +upfront.

[citation needed], if we overflow that buffer we're screwed and hit
device_loss, and that's unacceptable in normal usage.

> +The problem with on-demand allocation is that suddenly, GPU accesses can
> +fail on OOM, and the DRM components (drm_gpu_scheduler and drm_gem mostly)
> +were not designed for that.

It's not the common DRM scheduler that causes this problem, it
fundamentally violates the kernel-wide dma_fence contract: signalling a
dma-fence must not block on a fallible memory allocation, full stop.
Nothing we do in DRM will change that contract (and it's not obvious to
me that kbase is actually correct in all the corner cases).

> +The second trick to try to avoid over-allocation, even with this
> +sub-optimistic estimate, is to have a shared pool of memory that can be
> +used by all GPU contexts when they need tiler/geometry memory. This
> +implies returning chunks to this pool at some point, so other contexts
> +can re-use those. Details about what this global memory pool implementation
> +would look like is currently undefined, but it needs to be filled to
> +guarantee that pre-allocation requests for on-demand buffers used by a
> +GPU job can be satisfied in the fault handler path.

How do we clean memory between contexts? This is a security issue.
Either we need to pin physical pages to single processes, or we need to
zero pages when returning pages to the shared pool. Zeroing on the
CPU side is an option but the performance hit may be unacceptable
depending how it's implemented. Alternatively we can require userspace to
clean up after itself on the gpu (with a compute shader) but that's
going to burn memory b/w in the happy path where we have lots of memory
free.

> For GL +drivers, the UMD is in control of the context recreation, so
> it can easily +record the next buffer size to use.

I'm /really/ skeptical of this. Once we hit a device loss in GL, it's
game over, and I'm skeptical of any plan that expects userspace to
magically recover, especially as soon as side effects are introduced
(including transform feedback which is already gles3.0 required).

Reply via email to