On Thu, 18 Mar 2021 at 17:04, Tvrtko Ursulin
<tvrtko.ursu...@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
>
> A new Kconfig option CONFIG_DRM_I915_REQUEST_TIMEOUT is added, defaulting
> to 20s, and this timeout is applied to all users contexts using the
> previously added watchdog facility.
>
> Result of this is that any user submission will simply fail after this
> timeout, either causing a reset (for non-preemptable), or incomplete
> results.
>
> This can have an effect that workloads which used to work fine will
> suddenly start failing. Even workloads comprised of short batches but in
> long dependency chains can be terminated.
>
> And becuase of lack of agreement on usefulness and safety of fence error

   because

> propagation this partial execution can be invisible to userspace even if
> it is "listening" to returned fence status.
>
> Another interaction is with hangcheck where care needs to be taken timeout
> is not set lower or close to three times the heartbeat interval. Otherwise
> a hang in any application can cause complete termination of all
> submissions from unrelated clients. Any users modifying the per engine
> heartbeat intervals therefore need to be aware of this potential denial of
> service to avoid inadvertently enabling it.
>
> Given all this I am personally not convinced the scheme is a good idea.
> Intuitively it feels object importers would be better positioned to
> enforce the time they are willing to wait for something to complete.
>
> v2:
>  * Improved commit message and Kconfig text.
>  * Pull in some helper code from patch which got dropped.
>
> v3:
>  * Bump timeout to 20s to see if it helps Tigerlake.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
Acked-by: Matthew Auld <matthew.a...@intel.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to