On Wed, May 02, 2018 at 03:31:15PM -0700, Tarun Vyas wrote:
> On Wed, May 02, 2018 at 01:04:06PM -0700, Vivi, Rodrigo wrote:
> > On Wed, May 02, 2018 at 09:51:43PM +0300, Ville Syrjälä wrote:
> > > On Wed, May 02, 2018 at 11:19:14AM -0700, Tarun Vyas wrote:
> > > > On Mon, Apr 30, 2018 at 10:19:33AM -0700, Rodrigo Vivi wrote:
> > > > > On Sun, Apr 29, 2018 at 09:00:18PM -0700, Tarun Vyas wrote:
> > > > > > From: Tarun <tarun.v...@intel.com>
> > > > > >
> > > > > > The PIPEDSL freezes on PSR entry and if PSR hasn't fully exited, 
> > > > > > then
> > > > > > the pipe_update_start call schedules itself out to check back later.
> > > > > >
> > > > > > On ChromeOS-4.4 kernel, which is fairly up-to-date w.r.t drm/i915 
> > > > > > but
> > > > > > lags w.r.t core kernel code, hot plugging an external display 
> > > > > > triggers
> > > > > > tons of "potential atomic update errors" in the dmesg, on *pipe A*. 
> > > > > > A
> > > > > > closer analysis reveals that we try to read the scanline 3 times and
> > > > > > eventually timeout, b/c PSR hasn't exited fully leading to a PIPEDSL
> > > > > > stuck @ 1599. This issue is not seen on upstream kernels, b/c for 
> > > > > > *some*
> > > > > > reason we loop inside intel_pipe_update start for ~2+ msec which in 
> > > > > > this
> > > > > > case is more than enough to exit PSR fully, hence an *unstuck* 
> > > > > > PIPEDSL
> > > > > > counter, hence no error. On the other hand, the ChromeOS kernel 
> > > > > > spends
> > > > > > ~1.1 msec looping inside intel_pipe_update_start and hence errors 
> > > > > > out
> > > > > > b/c the source is still in PSR.
> > > > > >
> > > > > > Regardless, we should wait for PSR exit (if PSR is supported and 
> > > > > > active
> > > > > > on the current pipe) before reading the PIPEDSL, b/c if we haven't
> > > > > > fully exited PSR, then checking for vblank evasion isn't actually
> > > > > > applicable.
> > > > > >
> > > > > > This scenario applies to a configuration with an additional pipe,
> > > > > > as of now.
> > > > >
> > > > > I honestly believe you picking the wrong culprit here. By 
> > > > > "coincidence".
> > > > > PSR will allow DC state with screen on and DC state will mess up with 
> > > > > all
> > > > > registers reads....
> > > > >
> > > > > probably what you are missing you your kernel is some power domain
> > > > > grab that would keep DC_OFF and consequently a sane read of these
> > > > > registers.
> > > > >
> > > > > Maybe Imre has a quick idea of what you could be missing on your 
> > > > > kernel
> > > > > that we already have on upstream one.
> > > > >
> > > > > Thanks,
> > > > > Rodrigo.
> > > > >
> > > > Thanks for the quick response Rodrigo !
> > > > Some key observations based on my experiments so far:
> > > >        for (;;) {
> > > >                 /*
> > > >                  * prepare_to_wait() has a memory barrier, which 
> > > > guarantees
> > > >                  * other CPUs can see the task state update by the time 
> > > > we
> > > >                  * read the scanline.
> > > >                  */
> > > >                 prepare_to_wait(wq, &wait, TASK_UNINTERRUPTIBLE);
> > > >
> > > >                 scanline = intel_get_crtc_scanline(crtc);
> > > >                 if (scanline < min || scanline > max)
> > > >                         break;
> > > >
> > > >                 if (timeout <= 0) {
> > > >                         DRM_ERROR("Potential atomic update failure on 
> > > > pipe %c\n",
> > > >                                   pipe_name(crtc->pipe));
> > > >                         break;
> > > >                 }
> > > >
> > > >                 local_irq_enable();
> > > >
> > > >                 timeout = schedule_timeout(timeout);
> > > >
> > > >                 local_irq_disable();
> > > >         }
> > > > 1. In the above loop inside pipe_update_start, the *first time*, we 
> > > > read the PIPEDSL, with PSR1 and external display connected, it always 
> > > > reads 1599, for *both* the kernels(upstream and ChromeOS-4.4) . The 
> > > > PSR_STATUS also reads the exact same for *both* kernels and shows that 
> > > > we haven't *fully* exited PSR.
> > > >
> > > > 2. The difference between the two kernels comes after this first read 
> > > > of the PIPEDSL. ChromeOS-4.4 spends ~1 msec inside that loop and 
> > > > upstream spends ~2msec. I suspect that it is because of the scheduling 
> > > > changes between the two kernels, b/c I can't find any i915 specific 
> > > > code running in that loop, except for vblank processing.
> > > >
> > > > 3. So to summarize it, both the kernels are in the same state w.r.t PSR 
> > > > and PIPEDSL value when they read the PIPEDSL for the first time inside 
> > > > the loop. *When* the kernels *transition* to a *full PSR exit* is what 
> > > > is differing.
> > 
> > Oh! So you really are getting reliable counters....
> > 
> > > >
> > > > My rationale for this patch is that, the pipe_update_start function is 
> > > > meant to evade 100 usec before a vblank, but, *if* we haven't *fully* 
> > > > exited PSR (which is true for both the kernels for the first PIPEDSL 
> > > > read), then vblank evasion is *not applicable* b/c the PIPEDSL will be 
> > > > messed up. So we shouldn't bother evading vblank until we have fully 
> > > > exited PSR.
> > >
> > > Yeah, I think this is the right direction. The problem really is the
> > > extra vblank pulse that the hardware generates (or at least can
> > > generate depending on a chicken bit) when it exits PSR. We have no
> > > control over when that happens and hence we have no control over when
> > > the registers get latched. And yet we still have to somehow prevent
> > > the register latching from occurring while we're in middle of
> > > reprogramming them.
> > 
> > I see the problem now. Thanks for the explanation.
> > 
> > >
> > > There are a couple of ways to avoid this:
> > > 1) Set the chicken bit so that we don't get the vblank pulse. The
> > >    pipe should restart from the vblank start, so we would have one
> > >    full frame to reprogam the registers. Howver IIRC DK told me
> > >    there is no way to fully eliminate it in all cases so this
> > >    option is probably out. There was also some implication for FBC
> > >    which I already forgot.
> > > 2) Make sure we've exited PSR before repgrogamming the registers
> > >    (ie. what you do).
> > > 3) Use the DOUBLE_BUFFER_CTL to prevent the extra vblank pulse from
> > >    latching the registers while we're still reprogramming them.
> > >    This feature only exists on SKL+ so is not a solution for
> > >    HSW/BDW. But maybe HSW/BDW didn't even have the extra vblank
> > >    pulse?
> > >
> > > Option 2) does provide a consistent behaviour on all platforms, so I
> > > do kinda like it. It also avoids a bigger reword on account of the
> > > DOUBLE_BUFFER_CTL. I do think we'll have to start using
> > > DOUBLE_BUFFER_CTL anyway due to other issues, but at least this way
> > > we don't block PSR progress on that work.
> > 
> > My vote is for the option 2. Seems more straighforward and more broad.
> > 
> > DK?
> > 
> > My only request on the patch itself would be to create a function
> > on intel_psr.c intel_psr_wait_for_idle... or something like this
> > and put the register wait logic inside it instead of spreading
> > the psr code around.
> > 
> > Thanks,
> > Rodrigo.
> > 
> > >
> > > --
> > > Ville Syrjälä
> > > Intel
> Thanks for the comments, Ville and Rodrigo. I'll rework this to move the wait 
> to intel_psr.c. There is a psr_wait_for_idle() in there, but there are some 
> PSR locks being passed around inside it (eventually released by the caller). 
> Also,the max timeout specified there is 50 msec which might be way too much ?

ouch! that function is ugly.... unlock than lock back again...
(specially unlock without any assert locked... :/)

If you can improve that or split in a way that we reuse some code it would be 
nice...

>
> Best,
> Tarun
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to