> BTW, is there a piece of doc explaining the rational behind this > dma_fence contract, or is it just the usual informal knowledge shared > among DRM devs over IRC/email threads :-) ? > > To be honest, I'm a bit unhappy with this "it's part of the dma_fence > contract" explanation, because I have a hard time remembering all the > details that led to these set of rules myself, so I suspect it's even > harder for new comers to reason about this. To me, it's one of the > reasons people fail to understand/tend to forget what the > problems/limitations are, and end up ignoring them (intentionally or > not). > > FWIW, this is what I remember, but I'm sure there's more: > > 1. dma_fence must signal in finite time, so unbounded waits in the > fence signalling path path is not good, and that's what happens with > GFP_KERNEL allocations > 2. if you're blocked in your GPU fault handler, that means you can't > process further faults happening on other contexts > 3. GPU drivers are actively participating in the memory reclaim > process, which leads to deadlocks if the memory allocation in the > fault handler is waiting on the very same GPU job fence that's > waiting for its memory allocation to be satisfied > > I'd really love if someone (Sima, Alyssa and/or Christian?) could sum it > up, so I can put the outcome of this discussion in some kernel doc > entry (or maybe it'd be better if this was one of you submitting a > patch for that ;-)). If it's already documented somewhere, I'll just > have to eat my hat and accept your RTFM answer :-).
https://www.kernel.org/doc/html/next/driver-api/dma-buf.html#dma-fence-cross-driver-contract Specifically Drivers are allowed to call dma_fence_wait() from their shrinker callbacks. This means any code required for fence completion cannot allocate memory with GFP_KERNEL. Concretely: * Job requires memory allocation to signal a fence * We're in a low memory situation, so the shrinker is invoked * The shrinker can't free memory until the job finishes * Deadlock! Possibly we could relax the contract to let us reclaim non-graphics memory, but that's not my department.