On 03/23/2016 12:42 PM, Chris Wilson wrote: > On Wed, Mar 23, 2016 at 04:32:59PM +0100, David Herrmann wrote: >> Hi >> >> On Wed, Mar 23, 2016 at 12:56 PM, Chris Wilson <chris at chris-wilson.co.uk> >> wrote: >>> On Wed, Mar 23, 2016 at 12:30:42PM +0100, David Herrmann wrote: >>>> My question was rather about why we do this? Semantics for EINTR are >>>> well defined, and with SA_RESTART (default on linux) user-space can >>>> ignore it. However, looping on EAGAIN is very uncommon, and it is not >>>> at all clear why it is needed? >>>> >>>> Returning an error to user-space makes sense if user-space has a >>>> reason to react to it. I fail to see how EAGAIN on a cache-flush/sync >>>> operation helps user-space at all? As someone without insight into the >>>> driver implementation, it is hard to tell why.. Any hints? >>> >>> The reason we return EAGAIN is to workaround a deadlock we face when >>> blocking on the GPU holding the struct_mutex (inside the client's >>> process), but the GPU is dead. As our locking is very, very coarse we >>> cannot restart the GPU without acquiring the struct_mutex being held by >>> the client so we wake the client up and tell them the resource they are >>> waiting on (the flush of the object from the GPU into the CPU domain) is >>> temporarily unavailable. If they try to immediately wait upon the ioctl >>> again, they are blocked waiting for the reset to occur before they may >>> complete their flush. There are a few other possible deadlocks that are >>> also avoided with EAGAIN (again, the issue is more or less the lack of >>> fine grained locking). >> >> ...so you hijacked EAGAIN for all DRM ioctls just for a driver >> workaround? > > No, we utilized the fact that EAGAIN was already enshrined by libdrm as > the defacto mechanism for repeating the ioctl in order to repeat the > ioctl for a driver workaround.
Do we have an agreement here after all? David? I need to know whether this fixup is okay to go cause I'll need to submit to Chrome OS then. Best Regards, Tiago