[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()

2014-06-19 Thread Konrad Zapalowicz
This commit add check for return value of init_ring_common() in the
init_render_ring(). Now, when failure is detected the error code is
propagated to the caller layer instead of being ignored.

I believe that this fix will have a positive impact on the oops that
I hit recently and which starts when init_ring_common() fails:

[drm:init_ring_common] *ERROR* render ring initialization failed
 ctl 0001f001 head 000c tail  start 003eb000
BUG: unable to handle kernel NULL pointer dereference at 006c
IP: [] i915_gem_obj_to_ggtt+0x9/0x40 [i915]

Signed-off-by: Konrad Zapalowicz 
---
 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279488a..d205b0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -604,6 +604,8 @@ static int init_render_ring(struct intel_engine_cs *ring)
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
int ret = init_ring_common(ring);
+   if (ret)
+   return ret;

/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
-- 
1.8.1.2



[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()

2014-06-19 Thread Konrad Zapalowicz
On 06/19, Daniel Vetter wrote:
> On Thu, Jun 19, 2014 at 12:38 AM, Konrad Zapalowicz
>  wrote:
> > This commit add check for return value of init_ring_common() in the
> > init_render_ring(). Now, when failure is detected the error code is
> > propagated to the caller layer instead of being ignored.
> >
> > I believe that this fix will have a positive impact on the oops that
> > I hit recently and which starts when init_ring_common() fails:
> >
> > [drm:init_ring_common] *ERROR* render ring initialization failed
> >  ctl 0001f001 head 000c tail  start 003eb000
> > BUG: unable to handle kernel NULL pointer dereference at 006c
> > IP: [] i915_gem_obj_to_ggtt+0x9/0x40 [i915]
> >
> > Signed-off-by: Konrad Zapalowicz 
> 
> Do you have the full Oops somewhere?

Here you go, the Oops plus some usefull data:
  1. Oops
  2. lspci -vv
  3. uname -a
  4. Oops analysis

1. The Oops:

Jun 17 21:06:11 t400 kernel: [   12.136049] [drm:init_ring_common] *ERROR* 
render ring initialization failed ctl 0001f001 head 000c tail  
start 003eb000
Jun 17 21:06:11 t400 kernel: [   12.136081] BUG: unable to handle kernel NULL 
pointer dereference at 006c
Jun 17 21:06:11 t400 kernel: [   12.136086] IP: [] 
i915_gem_obj_to_ggtt+0x9/0x40 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136118] *pdpt = 33158001 *pde = 
 
Jun 17 21:06:11 t400 kernel: [   12.136123] Oops:  [#1] SMP 
Jun 17 21:06:11 t400 kernel: [   12.136127] Modules linked in: mac80211(E) 
i915(E+) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_intel(E) 
snd_hda_controller(E) intel_gtt(E) snd_hda_codec(E) iwlwifi(E) i2c_algo_bit(E) 
snd_hwdep(E) uvcvideo(E) thinkpad_acpi(E) drm_kms_helper(E) snd_pcm(E) 
pcmcia(E) nvram(E) videobuf2_vmalloc(E) videobuf2_memops(E) videobuf2_core(E) 
drm(E) psmouse(E) videodev(E) snd_seq_midi(E) snd_seq_midi_event(E) 
snd_rawmidi(E) snd_seq(E) cfg80211(E) yenta_socket(E) snd_timer(E) 
pcmcia_rsrc(E) serio_raw(E) snd_seq_device(E) pcmcia_core(E) snd(E) pl2303(E) 
lpc_ich(E) ppdev(E) usb_storage(E) soundcore(E) usbserial(E) wmi(E) video(E) 
tpm_tis(E) parport_pc(E) lp(E) parport(E) firewire_ohci(E) firewire_core(E) 
crc_itu_t(E) ahci(E) libahci(E) e1000e(E) ptp(E) pps_core(E)
Jun 17 21:06:11 t400 kernel: [   12.136187] CPU: 1 PID: 570 Comm: modprobe 
Tainted: GE 3.15.0 #6
Jun 17 21:06:11 t400 kernel: [   12.136191] Hardware name: LENOVO 
6475FA4/6475FA4, BIOS 7UET79WW (3.09 ) 10/13/2009
Jun 17 21:06:11 t400 kernel: [   12.136195] task: f3141b60 ti: f316a000 
task.ti: f316a000
Jun 17 21:06:11 t400 kernel: [   12.136199] EIP: 0060:[] EFLAGS: 
00010282 CPU: 1
Jun 17 21:06:11 t400 kernel: [   12.136223] EIP is at 
i915_gem_obj_to_ggtt+0x9/0x40 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136227] EAX:  EBX: 0004 ECX: 
f2e6c000 EDX: fffb
Jun 17 21:06:11 t400 kernel: [   12.136230] ESI: f2e6d174 EDI:  EBP: 
f316bb50 ESP: f316bb4c
Jun 17 21:06:11 t400 kernel: [   12.136234]  DS: 007b ES: 007b FS: 00d8 GS: 
0033 SS: 0068
Jun 17 21:06:11 t400 kernel: [   12.136239] CR0: 8005003b CR2: 006c CR3: 
33157000 CR4: 000407f0
Jun 17 21:06:11 t400 kernel: [   12.136243] Stack:
Jun 17 21:06:11 t400 kernel: [   12.136245]  0004 f316bb68 f8ca16c5 
f316bb80 0004 f2e6d174 f2e794c0 f316bb80
Jun 17 21:06:11 t400 kernel: [   12.136254]  f8c9473e f2e6c000 f2e6c000 
f32e5800 fffb f316bba0 f8c9ea49 
Jun 17 21:06:11 t400 kernel: [   12.136263]   f32e5838 f32e5800 
 f2e6c000 f316bca4 f8cf63a4 f8cf3c70
Jun 17 21:06:11 t400 kernel: [   12.136271] Call Trace:
Jun 17 21:06:11 t400 kernel: [   12.136297]  [] 
i915_gem_object_ggtt_unpin+0x15/0x90 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136323]  [] 
i915_gem_context_fini+0x7e/0x130 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136349]  [] 
i915_gem_init+0x69/0x1a0 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136381]  [] 
i915_driver_load+0xa54/0xe50 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136411]  [] ? 
i915_dma_init+0x2e0/0x2e0 [i915]
Jun 17 21:06:11 t400 kernel: [   12.136419]  [] ? 
kobject_uevent_env+0xfa/0x510
Jun 17 21:06:11 t400 kernel: [   12.136424]  [] ? 
kobject_uevent_env+0xfa/0x510
Jun 17 21:06:11 t400 kernel: [   12.136429]  [] ? 
add_uevent_var+0xd0/0xd0
Jun 17 21:06:11 t400 kernel: [   12.136435]  [] ? get_device+0x14/0x30
Jun 17 21:06:11 t400 kernel: [   12.136440]  [] ? 
klist_class_dev_get+0x12/0x20
Jun 17 21:06:11 t400 kernel: [   12.136447]  [] ? 
klist_node_init+0x35/0x50
Jun 17 21:06:11 t400 kernel: [   12.136452]  [] ? 
klist_add_tail+0x20/0x50
Jun 17 21:06:11 t400 kernel: [   12.136457]  [] ? put_device+0x14/0x20
Jun 17 21:06:11 t400 kernel: [   12.136462]  [] ? 
device_add+0x167/0x530
Jun 17 21:06:11 t400 kernel: [   12.136468]  [] ? 
kobject_set_name_vargs+0x42/0x60
Jun 17 21:06:11 t400 kernel: [   12.136485]  [] 
drm_dev_register+0x9e/0xf0 [drm]
Jun 17 

[PATCH v2] drivers/i915: Fix unnoticed failure of init_ring_common()

2014-06-19 Thread Konrad Zapalowicz
This commit add check for return value of init_ring_common() in the
init_render_ring(). Now, when failure is detected the error code is
propagated to the caller instead of being ignored.

Signed-off-by: Konrad Zapalowicz 
---

 v2:
   - remove from commit message references to the Oops.

 drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c 
b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 279488a..d205b0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -604,6 +604,8 @@ static int init_render_ring(struct intel_engine_cs *ring)
struct drm_device *dev = ring->dev;
struct drm_i915_private *dev_priv = dev->dev_private;
int ret = init_ring_common(ring);
+   if (ret)
+   return ret;

/* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */
if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7)
-- 
1.8.1.2



[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()

2014-06-19 Thread Konrad Zapalowicz
On 06/19, Daniel Vetter wrote:
> On Thu, Jun 19, 2014 at 4:35 PM, Daniel Vetter  
> wrote:
> > The actual bug we seem to have is blowing up on the ggtt_unpin in
> > context_fini. Which is doubly-impossible: Gen4 doesn't have hw
> > contexts, so should have dctx->obj == NULL. And ring init failures
> > fail earlier so shouldn't even hit the context_fini code below the
> > cleanup_gem: label in driver_load. Seriously confused here.
> 
> Also please retest with latest upstream, we've made the ring init
> failure non-letal for the driver. That should help you, too.
> -Daniel

Thanks for comments and I will resubmit the patch, still it is better
to have it.

/Konrad

> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch