Ah, yes. That makes a bit more sense, in this case the patch is indeed necessary.
But why is XE hard coding MAX_SCHEDULE_TIMEOUT here while i915 isn't? Regards, Christian On 04.07.25 15:05, Aakash Deep Sarkar wrote: >> And as far as I can see i915 is using the config option >> DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very >> large value this should work as expected. > > Sorry I forgot to mention this in the commit message; this issue is happening > in the xe driver. DRM_I915_FENCE_TIMEOUT config is used only in the case of > i915 driver. For xe it's hardcoded to MAX_SCHEDULE_TIMEOUT in the > compat-i915-headers layer. > > Regards, > Aakash > > -----Original Message----- > From: Christian König <christian.koe...@amd.com> > Sent: 04 July 2025 17:57 > To: Jani Nikula <jani.nik...@linux.intel.com>; Aakash Deep Sarkar > <aakash.deep.sar...@intel.com>; intel-gfx@lists.freedesktop.org > Cc: Badrappan, Jeevaka <jeevaka.badrap...@intel.com>; Ville Syrjälä > <ville.syrj...@linux.intel.com>; Maarten Lankhorst > <maarten.lankho...@linux.intel.com>; Hogander, Jouni > <jouni.hogan...@intel.com>; Alex Deucher <alexander.deuc...@amd.com>; Lucas > Stach <l.st...@pengutronix.de>; Rob Clark <robin.cl...@oss.qualcomm.com>; > Thierry Reding <thierry.red...@gmail.com>; Julia Lawall > <julia.law...@inria.fr>; Dan Carpenter <dan.carpen...@linaro.org> > Subject: Re: [PATCH] drm/i915/display: Change ret value type from int to long > > On 04.07.25 14:00, Jani Nikula wrote: >> On Fri, 04 Jul 2025, Aakash Deep Sarkar <aakash.deep.sar...@intel.com> wrote: >>> dma_fence_wait_timeout returns a long type but the driver is only >>> using the lower 32 bits of the retval and discarding the upper 32 >>> bits. >>> >>> This is particularly problematic if there are already signalled or >>> stub fences on some of the hw planes. In this case the >>> dma_fence_wait_timeout function will immediately return with timeout >>> value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since the fence is >>> already signalled. > > That is not correct. If the fence is signaled dma_fence_wait_timeout() > returns the remaining timeout or at least 1 if the input timeout was 0. > > Could be that i915 has a separately implemented fence_ops->wait callback > which incorrectly returns MAX_SCHEDULE_TIMEOUT, but i strongly doubt that > because that would break tons of stuff. > >> If the driver only uses the lower >>> 32 bits of this return value then it'll interpret it as an error code >>> (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences. >>> >>> This issue was first observed with the Android compositor where the >>> GPU composited layer was not properly waited on when there were stub >>> fences in other overlay planes resulting in significant visual >>> artifacts. >> >> Thanks for the patch, good catch! > > If I'm not completely mistaken this patch is not necessary. > > dma_fence_wait_timeout() does *not* return MAX_SCHEDULE_TIMEOUT when the > fence is signaled, but rather the remaining timeout which is the input value > at maximum. > > So when the input timeout fits into 32bits the returned timeout also fits > into 32bits as well. > > And as far as I can see i915 is using the config option > DRM_I915_FENCE_TIMEOUT here. So as long as you don't set this to very very > large value this should work as expected. > >>> Test: No graphical artifacts with shadertoy >> >> We've never used this commit trailer before, please let's not start now. >> >> The subject should plainly state the "what", and the commit message >> should answer the "why". You do have that here, but I think the >> subject is still a bit too much nuts and bolts. >> >> For example, >> >> drm/i915/display: Fix dma_fence_wait_timeout() return value handling >> >> would state the "what" in *much* more helpful terms for anyone looking >> at git logs. >> >> Presumably this has been an error for some time. You should do a >> little bit of git blame on the offending lines. It'll find commit >> d59cf7bb73f3 >> ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence"). >> >> Based on that, we should add: >> >> Fixes: d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces >> instead of i915_sw_fence") >> Cc: Ville Syrjälä <ville.syrj...@linux.intel.com> >> Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> >> Cc: Jouni Högander <jouni.hogan...@intel.com> >> Cc: <sta...@vger.kernel.org> # v6.8+ >> >> Then it occurs to me this looks like a common mistake to make. A >> little bit of git grep on dma_fence_wait_timeout() quickly finds >> multiple similar mistakes in drm, at least amdgpu, etnaviv, msm, and >> tegra. Cc some maintainers FYI. This class of bugs could cause issues >> elsewhere. > > I can only guess but I think all those use cases use a fixed small timeout as > well. IIRC amdgpu uses a timeout in the millisecond range. > >> Let's also Cc Julia and Dan in case they have ideas to improve static >> analysis to catch this class of bugs. Or maybe one exists already, but >> we're just not running them! > > Yeah, agree. A script which checks if the input timeout could be >32bit and > the return value is assigned to something <=32bits is probably a really good > idea. > > Regards, > Christian. > >> >> Finally, for the actual change, >> >> Reviewed-by: Jani Nikula <jani.nik...@intel.com> >> >> >>> Signed-off-by: Aakash Deep Sarkar <aakash.deep.sar...@intel.com> >>> --- >>> drivers/gpu/drm/i915/display/intel_display.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/i915/display/intel_display.c >>> b/drivers/gpu/drm/i915/display/intel_display.c >>> index 456fc4b04cda..7035c1fc9033 100644 >>> --- a/drivers/gpu/drm/i915/display/intel_display.c >>> +++ b/drivers/gpu/drm/i915/display/intel_display.c >>> @@ -7092,7 +7092,8 @@ static void intel_atomic_commit_fence_wait(struct >>> intel_atomic_state *intel_stat >>> struct drm_i915_private *i915 = to_i915(intel_state->base.dev); >>> struct drm_plane *plane; >>> struct drm_plane_state *new_plane_state; >>> - int ret, i; >>> + long ret; >>> + int i; >>> >>> for_each_new_plane_in_state(&intel_state->base, plane, new_plane_state, >>> i) { >>> if (new_plane_state->fence) { >> >