On 11/25/2015 08:36 PM, Ville Syrjälä wrote: > On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote: >> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:
... >> Attached is my current patch i wanted to submit for the drm core's >> drm_update_vblank_count(). I think it's good to make the core somewhat >> robust against potential kms driver bugs or glitches. But if you >> wouldn't like that patch, there wouldn't be much of a point sending it >> out at all. >> >> thanks, >> -mario >> > >> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001 >> From: Mario Kleiner <mario.kleiner.de at gmail.com> >> Date: Wed, 25 Nov 2015 18:48:31 +0100 >> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust. >> >> The changes to drm_update_vblank_count() for Linux 4.4-rc >> made the function more fragile wrt. some hw quirks. E.g., >> at dev->driver->enable_vblank(), AMD gpu's fire a spurious >> redundant vblank irq shortly after enabling vblank irqs, not >> locked to vblank. This causes a redundant call which needs >> to be suppressed to avoid miscounting. >> >> To increase robustness, shuffle things around a bit: >> >> On drivers with high precision vblank timestamping always >> evaluate the timestamp difference between current timestamp >> and previously recorded timestamp to detect such redundant >> invocations and no-op in that case. >> >> Also detect and warn about timestamps going backwards to >> catch potential kms driver bugs. >> >> This patch is meant for Linux 4.4-rc and later. >> >> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com> >> --- >> drivers/gpu/drm/drm_irq.c | 53 >> ++++++++++++++++++++++++++++++++++++----------- >> 1 file changed, 41 insertions(+), 12 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c >> index 819b8c1..8728c3c 100644 >> --- a/drivers/gpu/drm/drm_irq.c >> +++ b/drivers/gpu/drm/drm_irq.c >> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device >> *dev, unsigned int pipe, >> unsigned long flags) >> { >> struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; >> - u32 cur_vblank, diff; >> + u32 cur_vblank, diff = 0; >> bool rc; >> struct timeval t_vblank; >> + const struct timeval *t_old; >> + u64 diff_ns; >> int count = DRM_TIMESTAMP_MAXRETRIES; >> int framedur_ns = vblank->framedur_ns; >> >> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device >> *dev, unsigned int pipe, >> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags); >> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && >> --count > 0); >> >> - if (dev->max_vblank_count != 0) { >> - /* trust the hw counter when it's around */ >> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> - } else if (rc && framedur_ns) { >> - const struct timeval *t_old; >> - u64 diff_ns; >> - >> + /* >> + * Always use vblank timestamping based method if supported to reject >> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up >> + * due to some irq handling quirk. >> + * >> + * This also sets the diff value for use as fallback below in case the >> + * hw does not support a suitable hw vblank counter. >> + */ >> + if (rc && framedur_ns) { > > If you fudged everything properly why do you still need this? With > working hw counter there should be no need to do this stuff. > As far as testing on one DCE4 card goes, i don't need it anymore with my fudged hw counters and timestamps. The fudging so far seems to work nicely. I just wanted to have a bit of extra robustness and a bit of extra available debug output against future or other broken drivers, or mistakes in fudging on the current driver, e.g., against things like timestamps going backwards. Especially since i can only test on two AMD cards atm., quite a limited sample. There are 3 display engine generations before and 5 generations after my test sample. -mario >> t_old = &vblanktimestamp(dev, pipe, vblank->count); >> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old); >> >> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device >> *dev, unsigned int pipe, >> */ >> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns); >> >> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) >> + /* Catch driver timestamping bugs and prevent worse. */ >> + if ((s32) diff < 0) { >> + DRM_DEBUG_VBL("crtc %u: Timestamp going backward!" >> + " diff_ns = %lld, framedur_ns = %d)\n", >> + pipe, (long long) diff_ns, framedur_ns); >> + return; >> + } >> + >> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) { >> DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored." >> - " diff_ns = %lld, framedur_ns = %d)\n", >> - pipe, (long long) diff_ns, framedur_ns); >> - } else { >> + " diff_ns = %lld, framedur_ns = %d)\n", >> + pipe, (long long) diff_ns, framedur_ns); >> + >> + /* Treat this redundant invocation as no-op. */ >> + WARN_ON_ONCE(cur_vblank != vblank->last); >> + return; >> + } >> + } >> + >> + /* >> + * hw counters, if marked as supported via max_vblank_count != 0, >> + * *must* increment at leading edge of vblank or at least before >> + * the firing of the hw vblank irq, otherwise we have a race here >> + * between gpu and us, where small variations in our execution >> + * timing can cause off-by-one miscounting of vblanks. >> + */ >> + if (dev->max_vblank_count != 0) { >> + /* trust the hw counter when it's around */ >> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count; >> + } else if (!(rc && framedur_ns)) { >> /* some kind of default for drivers w/o accurate vbl >> timestamping */ >> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; >> } >> -- >> 1.9.1 >> > >