On Fri, Jul 04, 2025 at 03:00:55PM +0300, Jani Nikula wrote: > On Fri, 04 Jul 2025, Aakash Deep Sarkar <aakash.deep.sar...@intel.com> wrote: > > dma_fence_wait_timeout returns a long type but the driver is > > only using the lower 32 bits of the retval and discarding the > > upper 32 bits. > > > > This is particularly problematic if there are already signalled > > or stub fences on some of the hw planes. In this case the > > dma_fence_wait_timeout function will immediately return with > > timeout value MAX_SCHEDULE_TIMEOUT (0x7fffffffffffffff) since > > the fence is already signalled. If the driver only uses the lower > > 32 bits of this return value then it'll interpret it as an error > > code (0xFFFFFFFF or (-1)) and skip the wait on the remaining fences. > > > > This issue was first observed with the Android compositor where > > the GPU composited layer was not properly waited on when there > > were stub fences in other overlay planes resulting in significant > > visual artifacts. > > Thanks for the patch, good catch! > > > Test: No graphical artifacts with shadertoy > > We've never used this commit trailer before, please let's not start now. > > The subject should plainly state the "what", and the commit message > should answer the "why". You do have that here, but I think the subject > is still a bit too much nuts and bolts. > > For example, > > drm/i915/display: Fix dma_fence_wait_timeout() return value handling > > would state the "what" in *much* more helpful terms for anyone looking > at git logs. > > Presumably this has been an error for some time. You should do a little > bit of git blame on the offending lines. It'll find commit d59cf7bb73f3 > ("drm/i915/display: Use dma_fence interfaces instead of i915_sw_fence"). > > Based on that, we should add: > > Fixes: d59cf7bb73f3 ("drm/i915/display: Use dma_fence interfaces instead of > i915_sw_fence") > Cc: Ville Syrjälä <ville.syrj...@linux.intel.com> > Cc: Maarten Lankhorst <maarten.lankho...@linux.intel.com> > Cc: Jouni Högander <jouni.hogan...@intel.com> > Cc: <sta...@vger.kernel.org> # v6.8+ > > Then it occurs to me this looks like a common mistake to make. A little > bit of git grep on dma_fence_wait_timeout() quickly finds multiple > similar mistakes in drm, at least amdgpu, etnaviv, msm, and tegra. Cc > some maintainers FYI. This class of bugs could cause issues elsewhere. > > Let's also Cc Julia and Dan in case they have ideas to improve static > analysis to catch this class of bugs. Or maybe one exists already, but > we're just not running them!
It's easy enough to warn about when we have: ret = dma_fence_wait_timeout(); and ret is an int. In Smatch I actually had hardcoded dma_fence_wait_timeout() as only returning up to INT_MAX because there were enough places which saved it as an int and it triggered false positives in callers where we knew the timeout was reasonable. regards, dan carpenter