[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()
This commit add check for return value of init_ring_common() in the init_render_ring(). Now, when failure is detected the error code is propagated to the caller layer instead of being ignored. I believe that this fix will have a positive impact on the oops that I hit recently and which starts when init_ring_common() fails: [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 000c tail start 003eb000 BUG: unable to handle kernel NULL pointer dereference at 006c IP: [] i915_gem_obj_to_ggtt+0x9/0x40 [i915] Signed-off-by: Konrad Zapalowicz --- drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 279488a..d205b0d 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -604,6 +604,8 @@ static int init_render_ring(struct intel_engine_cs *ring) struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); + if (ret) + return ret; /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) -- 1.8.1.2
[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()
On 06/19, Daniel Vetter wrote: > On Thu, Jun 19, 2014 at 12:38 AM, Konrad Zapalowicz > wrote: > > This commit add check for return value of init_ring_common() in the > > init_render_ring(). Now, when failure is detected the error code is > > propagated to the caller layer instead of being ignored. > > > > I believe that this fix will have a positive impact on the oops that > > I hit recently and which starts when init_ring_common() fails: > > > > [drm:init_ring_common] *ERROR* render ring initialization failed > > ctl 0001f001 head 000c tail start 003eb000 > > BUG: unable to handle kernel NULL pointer dereference at 006c > > IP: [] i915_gem_obj_to_ggtt+0x9/0x40 [i915] > > > > Signed-off-by: Konrad Zapalowicz > > Do you have the full Oops somewhere? Here you go, the Oops plus some usefull data: 1. Oops 2. lspci -vv 3. uname -a 4. Oops analysis 1. The Oops: Jun 17 21:06:11 t400 kernel: [ 12.136049] [drm:init_ring_common] *ERROR* render ring initialization failed ctl 0001f001 head 000c tail start 003eb000 Jun 17 21:06:11 t400 kernel: [ 12.136081] BUG: unable to handle kernel NULL pointer dereference at 006c Jun 17 21:06:11 t400 kernel: [ 12.136086] IP: [] i915_gem_obj_to_ggtt+0x9/0x40 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136118] *pdpt = 33158001 *pde = Jun 17 21:06:11 t400 kernel: [ 12.136123] Oops: [#1] SMP Jun 17 21:06:11 t400 kernel: [ 12.136127] Modules linked in: mac80211(E) i915(E+) snd_hda_codec_conexant(E) snd_hda_codec_generic(E) snd_hda_intel(E) snd_hda_controller(E) intel_gtt(E) snd_hda_codec(E) iwlwifi(E) i2c_algo_bit(E) snd_hwdep(E) uvcvideo(E) thinkpad_acpi(E) drm_kms_helper(E) snd_pcm(E) pcmcia(E) nvram(E) videobuf2_vmalloc(E) videobuf2_memops(E) videobuf2_core(E) drm(E) psmouse(E) videodev(E) snd_seq_midi(E) snd_seq_midi_event(E) snd_rawmidi(E) snd_seq(E) cfg80211(E) yenta_socket(E) snd_timer(E) pcmcia_rsrc(E) serio_raw(E) snd_seq_device(E) pcmcia_core(E) snd(E) pl2303(E) lpc_ich(E) ppdev(E) usb_storage(E) soundcore(E) usbserial(E) wmi(E) video(E) tpm_tis(E) parport_pc(E) lp(E) parport(E) firewire_ohci(E) firewire_core(E) crc_itu_t(E) ahci(E) libahci(E) e1000e(E) ptp(E) pps_core(E) Jun 17 21:06:11 t400 kernel: [ 12.136187] CPU: 1 PID: 570 Comm: modprobe Tainted: GE 3.15.0 #6 Jun 17 21:06:11 t400 kernel: [ 12.136191] Hardware name: LENOVO 6475FA4/6475FA4, BIOS 7UET79WW (3.09 ) 10/13/2009 Jun 17 21:06:11 t400 kernel: [ 12.136195] task: f3141b60 ti: f316a000 task.ti: f316a000 Jun 17 21:06:11 t400 kernel: [ 12.136199] EIP: 0060:[] EFLAGS: 00010282 CPU: 1 Jun 17 21:06:11 t400 kernel: [ 12.136223] EIP is at i915_gem_obj_to_ggtt+0x9/0x40 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136227] EAX: EBX: 0004 ECX: f2e6c000 EDX: fffb Jun 17 21:06:11 t400 kernel: [ 12.136230] ESI: f2e6d174 EDI: EBP: f316bb50 ESP: f316bb4c Jun 17 21:06:11 t400 kernel: [ 12.136234] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Jun 17 21:06:11 t400 kernel: [ 12.136239] CR0: 8005003b CR2: 006c CR3: 33157000 CR4: 000407f0 Jun 17 21:06:11 t400 kernel: [ 12.136243] Stack: Jun 17 21:06:11 t400 kernel: [ 12.136245] 0004 f316bb68 f8ca16c5 f316bb80 0004 f2e6d174 f2e794c0 f316bb80 Jun 17 21:06:11 t400 kernel: [ 12.136254] f8c9473e f2e6c000 f2e6c000 f32e5800 fffb f316bba0 f8c9ea49 Jun 17 21:06:11 t400 kernel: [ 12.136263] f32e5838 f32e5800 f2e6c000 f316bca4 f8cf63a4 f8cf3c70 Jun 17 21:06:11 t400 kernel: [ 12.136271] Call Trace: Jun 17 21:06:11 t400 kernel: [ 12.136297] [] i915_gem_object_ggtt_unpin+0x15/0x90 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136323] [] i915_gem_context_fini+0x7e/0x130 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136349] [] i915_gem_init+0x69/0x1a0 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136381] [] i915_driver_load+0xa54/0xe50 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136411] [] ? i915_dma_init+0x2e0/0x2e0 [i915] Jun 17 21:06:11 t400 kernel: [ 12.136419] [] ? kobject_uevent_env+0xfa/0x510 Jun 17 21:06:11 t400 kernel: [ 12.136424] [] ? kobject_uevent_env+0xfa/0x510 Jun 17 21:06:11 t400 kernel: [ 12.136429] [] ? add_uevent_var+0xd0/0xd0 Jun 17 21:06:11 t400 kernel: [ 12.136435] [] ? get_device+0x14/0x30 Jun 17 21:06:11 t400 kernel: [ 12.136440] [] ? klist_class_dev_get+0x12/0x20 Jun 17 21:06:11 t400 kernel: [ 12.136447] [] ? klist_node_init+0x35/0x50 Jun 17 21:06:11 t400 kernel: [ 12.136452] [] ? klist_add_tail+0x20/0x50 Jun 17 21:06:11 t400 kernel: [ 12.136457] [] ? put_device+0x14/0x20 Jun 17 21:06:11 t400 kernel: [ 12.136462] [] ? device_add+0x167/0x530 Jun 17 21:06:11 t400 kernel: [ 12.136468] [] ? kobject_set_name_vargs+0x42/0x60 Jun 17 21:06:11 t400 kernel: [ 12.136485] [] drm_dev_register+0x9e/0xf0 [drm] Jun 17
[PATCH v2] drivers/i915: Fix unnoticed failure of init_ring_common()
This commit add check for return value of init_ring_common() in the init_render_ring(). Now, when failure is detected the error code is propagated to the caller instead of being ignored. Signed-off-by: Konrad Zapalowicz --- v2: - remove from commit message references to the Oops. drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 279488a..d205b0d 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -604,6 +604,8 @@ static int init_render_ring(struct intel_engine_cs *ring) struct drm_device *dev = ring->dev; struct drm_i915_private *dev_priv = dev->dev_private; int ret = init_ring_common(ring); + if (ret) + return ret; /* WaTimedSingleVertexDispatch:cl,bw,ctg,elk,ilk,snb */ if (INTEL_INFO(dev)->gen >= 4 && INTEL_INFO(dev)->gen < 7) -- 1.8.1.2
[PATCH] drivers/i915: Fix unnoticed failure of init_ring_common()
On 06/19, Daniel Vetter wrote: > On Thu, Jun 19, 2014 at 4:35 PM, Daniel Vetter > wrote: > > The actual bug we seem to have is blowing up on the ggtt_unpin in > > context_fini. Which is doubly-impossible: Gen4 doesn't have hw > > contexts, so should have dctx->obj == NULL. And ring init failures > > fail earlier so shouldn't even hit the context_fini code below the > > cleanup_gem: label in driver_load. Seriously confused here. > > Also please retest with latest upstream, we've made the ring init > failure non-letal for the driver. That should help you, too. > -Daniel Thanks for comments and I will resubmit the patch, still it is better to have it. /Konrad > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch