Hi Michel, >>>> [..] >>>> The most striking problem of kernel 3.18.9-rt4 affects all systems that >>>> are equipped with Radeon graphics (irrespective whether PCIe cards or >>>> APUs with on-chip graphics). They suffer from a hanging radeon driver. >>>> The block occurs when accelerated graphics load is created by x11perf or >>>> gltestperf. Sometimes only the graphics are frozen while ssh login still >>>> is possible, somtimes the entire box is no longer accessible at all. In >>>> any case, a reboot is needed to recover from this situation. >>>> >>>> Here is a selection of kernel messages: >>> [...] >>> The commits from >>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=f957063fee6392bb9365370db6db74dc0b2dce0a >>> >>> to >>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=cffefd9bb31cd35ab745d3b49005d10616d25bdc >>> >>> and >>> http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-fixes&id=b6610101718d4ab90d793c482625e98eb1262cad >>> >>> might help for this. >> >> Thanks a lot. I have applied these patches to a number of systems: >> # quilt applied | tail -7 >> patches/drm-radeon-do-a-posting-read-in-r100_set_irq.patch >> patches/drm-radeon-do-a-posting-read-in-rs600_set_irq.patch >> patches/drm-radeon-do-a-posting-read-in-r600_set_irq.patch >> patches/drm-radeon-do-a-posting-read-in-evergreen_set_irq.patch >> patches/drm-radeon-do-a-posting-read-in-si_set_irq.patch >> patches/drm-radeon-do-a-posting-read-in-cik_set_irq.patch >> patches/drm-radeon-fix-wait-to-actually-occur-after-the-signaling-callback.patch >> >> >> The graphic boards still crash and freeze the screen, but in contrast >> to the earlier situation the systems remain accessible, and the X >> Window server can be restarted after the offensive programs are >> removed. The crashes were reliably triggered by >> - gltestperf >> or >> - x11perf -repeat 3 -subs 25 -time 2 -rect10 This is not entirely correct, since gltestperf does not reliably crash the graphics controller. However, "x11perf -repeat 3 -subs 25 -time 2 -rect10" always does a reliable job to trigger the crash.
>> but the crashes also occur several times per day during normal work >> such as browsing the Internet or writing a text document. If you wish >> me to provide additional diagnostic information such as running test >> programs while the graphic boards are unresponsive, I certainly can do >> that. > > Does it also happen with a kernel built from a current drm-fixes tree? > http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-fixes No. Apparently, you need full preemption to expose the problem. The following list contains the results whether the command "x11perf -repeat 3 -subs 25 -time 2 -rect10" freezes the Radeon board under test (Radeon HD 7970 XFS / R9 280X) or not: linux-3.12.33-rt47 no linux-3.14.34-rt32 no linux-3.14.34-drm-3.16.7-rt32* no linux-3.18.7-rt1 YES linux-3.18.9-rt4 YES linux-3.18.9-rt5 YES linux-3.18.9-drm-3.16.7-rt5** no linux-4.0.0-rc4 no linux-drm-fixes no *DRM subsystem backported from linux-3.16.7 to linux-3.14.34-rt32. **DRM subsystem ported from linux-3.16.7 to linux-3.18.9-rt5. More observations: If full function tracing is enabled (which makes the system about five times slower), the graphics controller no longer freezes. With partial function tracing such as "echo *drm* >set_ftrace_filter", the controller still freezes. The trace then contains vblank interrupt processing only, ioctls are no longer executed. This is the location where the driver hangs: [25104.509258] INFO: task Xorg.bin:16591 blocked for more than 120 seconds. [25104.516322] Not tainted 3.18.9-rt5 #2 [25104.520715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [25104.528853] Xorg.bin D ffffffff8171ed90 0 16591 16239 0x10400080 [25104.536102] ffff8800ba0bb8d8 0000000000000002 ffff8800ba0bbfd8 0000000000000006 [25104.536103] 000000000000dc08 ffff880626d0dc08 ffff8800ba0bbfd8 000000000000dc08 [25104.536104] ffff88061b2cdcd0 ffff880616d3a940 ffff880035c10000 ffff880616d3a940 [25104.559274] Call Trace: [25104.561844] [<ffffffff8171bb54>] schedule+0x34/0xa0 [25104.561846] [<ffffffff8171e2ac>] schedule_timeout+0x23c/0x2a0 [25104.561870] [<ffffffffa00e3ab6>] ? radeon_fence_process+0x16/0x40 [radeon] [25104.561879] [<ffffffffa00e3b24>] ? radeon_fence_any_seq_signaled+0x44/0x90 [radeon] [25104.561887] [<ffffffffa00e3e97>] radeon_fence_wait_seq_timeout.constprop.8+0x327/0x380 [radeon] [25104.561889] [<ffffffff810d19c0>] ? __wake_up_sync+0x20/0x20 [25104.561898] [<ffffffffa00e4287>] radeon_fence_wait_any+0x57/0x70 [radeon] [25104.561914] [<ffffffffa015a36f>] radeon_sa_bo_new+0x2af/0x4b0 [radeon] [25104.561916] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20 [25104.561918] [<ffffffff811d0b4a>] ? __kmalloc+0x8a/0x300 [25104.561932] [<ffffffffa01b2197>] radeon_ib_get+0x37/0xe0 [radeon] [25104.561943] [<ffffffffa01003ee>] radeon_cs_ioctl+0x22e/0x860 [radeon] [25104.561952] [<ffffffffa0005bc7>] drm_ioctl+0x197/0x670 [drm] [25104.561954] [<ffffffff81379b07>] ? debug_smp_processor_id+0x17/0x20 [25104.561956] [<ffffffff810901ba>] ? unpin_current_cpu+0x1a/0x80 [25104.561959] [<ffffffff810ba200>] ? migrate_enable+0x90/0x1a0 [25104.561966] [<ffffffffa00c604c>] radeon_drm_ioctl+0x4c/0x80 [radeon] [25104.561967] [<ffffffff811fdb88>] do_vfs_ioctl+0x2c8/0x4c0 [25104.561969] [<ffffffff81208a92>] ? __fget+0x72/0xb0 [25104.561970] [<ffffffff811fde01>] SyS_ioctl+0x81/0xa0 [25104.561971] [<ffffffff8171f99e>] tracesys_phase2+0xd4/0xd9 Conclusion: An upgrade change of the DRM subsystem between 3.16.7 and 3.18.9 introduced a race condition that freezes Radeon graphics. It requires full preemption to be exposed reliably. Thanks, -Carsten.