[Intel-gfx] [PATCH 3/4] drm/dp: Clarify clock recovery and channel equalization failures

2016-08-04 Thread ch...@chris-wilson.co.uk
On Thu, Aug 04, 2016 at 04:50:35PM +, Pandiyan, Dhinakaran wrote:
> On Thu, 2016-08-04 at 04:12 +0100, Chris Wilson wrote:
> > On Wed, Aug 03, 2016 at 08:07:40PM -0700, Dhinakaran Pandiyan wrote:
> > > The causes of clock recovery and channel equalization failures are not
> > > explicitly printed in debug messages. Help debugging link training
> > > failures by printing why it failed.
> > > 
> > > Doing this in the driver would mean re-implementing some of the drm static
> > > functions that decode link status. Let's avoid that with these debug
> > > messages in drm.
> > > 
> > > Signed-off-by: Dhinakaran Pandiyan 
> > > ---
> > >  drivers/gpu/drm/drm_dp_helper.c | 12 +---
> > >  1 file changed, 9 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_dp_helper.c 
> > > b/drivers/gpu/drm/drm_dp_helper.c
> > > index 091053e..d763b57 100644
> > > --- a/drivers/gpu/drm/drm_dp_helper.c
> > > +++ b/drivers/gpu/drm/drm_dp_helper.c
> > > @@ -64,12 +64,16 @@ bool drm_dp_channel_eq_ok(const u8 
> > > link_status[DP_LINK_STATUS_SIZE],
> > >  
> > >   lane_align = dp_link_status(link_status,
> > >   DP_LANE_ALIGN_STATUS_UPDATED);
> > > - if ((lane_align & DP_INTERLANE_ALIGN_DONE) == 0)
> > > + if ((lane_align & DP_INTERLANE_ALIGN_DONE) == 0) {
> > > + DRM_DEBUG_KMS("Inter-lane alignment not done\n");
> > >   return false;
> > > + }
> > >   for (lane = 0; lane < lane_count; lane++) {
> > >   lane_status = dp_get_lane_status(link_status, lane);
> > > - if ((lane_status & DP_CHANNEL_EQ_BITS) != DP_CHANNEL_EQ_BITS)
> > > + if ((lane_status & DP_CHANNEL_EQ_BITS) != DP_CHANNEL_EQ_BITS) {
> > > + DRM_DEBUG_KMS("Channel equalization not done for lane 
> > > %d\n", lane);
> > >   return false;
> > > + }
> > >   }
> > >   return true;
> > >  }
> > > @@ -83,8 +87,10 @@ bool drm_dp_clock_recovery_ok(const u8 
> > > link_status[DP_LINK_STATUS_SIZE],
> > >  
> > >   for (lane = 0; lane < lane_count; lane++) {
> > >   lane_status = dp_get_lane_status(link_status, lane);
> > > - if ((lane_status & DP_LANE_CR_DONE) == 0)
> > > + if ((lane_status & DP_LANE_CR_DONE) == 0) {
> > > + DRM_DEBUG_KMS("Clock recovery not done for lane %d\n", 
> > > lane);
> > 
> > Please petition, with patch, for DRM_DEBUG_DP.
> > 
> Is that because DRM_DEBUG_KMS in not appropriate or that we have plenty
> of DP related debug messages that it warrants it's own debug macro?

In the case of link failure, we will have plenty of messages from DP.
One debug category just for DP may be overkill, and we may want
something more like DRM_DEBUG_KMS_LOWLEVEL (or DRM_DEBUG_KMS_DRIVER, or
DRM_DEBUG_KMS_HW etc) that suits all similar link training without
cluttering up either the high levels telling what the user selected, and
what the driver picked and the control flow through the modesetting.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH i-g-t] tests/drm_hw_lock: Tests for hw_lock fixes.

2015-04-28 Thread ch...@chris-wilson.co.uk
On Tue, Apr 28, 2015 at 09:28:51AM +, Antoine, Peter wrote:
> Also, the PARAM should give legitimate code the opportunity to avoid the
> problems but avoiding these features completely.

There are no legitimate users.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


[Intel-gfx] [PATCH 1/5] drm: Kernel Crash in drm_unlock

2015-04-28 Thread ch...@chris-wilson.co.uk
On Tue, Apr 28, 2015 at 10:21:49AM +0100, Dave Gordon wrote:
> On 24/04/15 06:52, Antoine, Peter wrote:
> > I picked up this work due to the following Jira ticket created by the
> > security team (on Android) and was asked to give it a second look and
> > found a few more issues with the hw lock code.
> > 
> > https://jira01.devtools.intel.com/browse/GMINL-5388
> > I/O control on /dev/dri/card0 crashes the kernel (0x4008642b)
> > 
> > It also stops Linux as it kills the driver, I guess it might be possible
> > to reload the gfx driver. On a unpatched system the test that is
> > included in the issue or the igt test that has been posted for the issue
> > will show the problem.
> > 
> > I ran the test on an unpatched system here and the gui stopped and the
> > keyboard stopped responding, so I rebooted. With the patched system I
> > did not need to reboot.
> > 
> > Should I change the SIGTERM to SIGSEGV, not quite the same thing but
> > tooling is better at handling a segfault than a SIGTERM and the
> > application that calls this IOCTL is using an uninitialised hw lock so
> > it is kind of the same as differencing an uninitialised pointer (kind
> > of). Or, I could just remove it, but the bug has been in the code for at
> > least two years (and known about), and I would guess that any code that
> > is calling this is fuzzing the IOCTLs (as this is how the security team
> > found it) and we should reward them with a application exit.
> > 
> > Peter. 
> 
> SIGSEGV would be a better choice.
> 
> SIGTERM is normally sent by a user -- it's the default signal sent by
> kill(1). It's also commonly used to tell a long-running daemon process
> to tidy up and exit cleanly.
> 
> SIGSEGV commonly means "you accessed something that doesn't exist/isn't
> mapped/you don't have permissions for". There are specific subcases that
> can be indicated via the siginfo data; this is from the sigaction(1)
> manpage:
> 
> The following values can be placed in si_code for a SIGSEGV signal:
> 
> SEGV_MAPERRaddress not mapped to object
> 
> SEGV_ACCERRinvalid permissions for mapped object
> 
> SIGBUS would also be a possibility but that's generally taken to mean
> that an access got all the way to some physical bus and then faulted,
> whereas SIGSEGV suggests the access was rejected during the
> virtual-to-physical mapping process.

None of the above. Just return -EINVAL, -EPERM, -EACCESS as appropriate.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre