https://bugs.freedesktop.org/show_bug.cgi?id=93101
Bug ID: 93101 Summary: GPU Fault almost burned the CPU Product: Mesa Version: git Hardware: Other OS: All Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/radeonsi Assignee: dri-devel at lists.freedesktop.org Reporter: dev at illwieckz.net QA Contact: dri-devel at lists.freedesktop.org Created attachment 120103 --> https://bugs.freedesktop.org/attachment.cgi?id=120103&action=edit syslog (short) Hi, this is an issue about the fact that some GPU lockup can lead to some CPU burn (for real). Some hours ago I get a GPU lockup while I was trying to read a DVD with VLC. The video rendering wasn't functionnal (no picture), then the GPU started to display weird things (see attached photo) then locked up. I've joined some log, one very long syslog, and some abstract for this one (more easy to read, but I gave you the original one in case of I missed something). To summarize, you can read lines like that in the syslog: ``` Nov 24 22:58:18 gollum gnome-session[3720]: [00007f134c173c20] avcodec decoder: Using G3DVL VDPAU Driver Shared Library version 1.0 for hardware decoding Nov 24 22:58:18 gollum kernel: [97035.599456] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00002126 Nov 24 22:58:18 gollum kernel: [97035.599460] radeon 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0408800C Nov 24 22:58:18 gollum kernel: [97035.599465] VM fault (0x0c, vmid 2) at page 8486, read from 'TC4' (0x54433400) (136) Nov 24 22:58:55 gollum kernel: [97072.747472] radeon 0000:01:00.0: ring 0 stalled for more than 10088msec Nov 24 22:58:55 gollum kernel: [97072.747483] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000059fcff last fence id 0x000000000059fd12 on ring 0) Nov 24 22:59:04 gollum kernel: [97081.259933] WARNING: CPU: 4 PID: 23502 at /home/kernel/COD/linux/drivers/gpu/drm/radeon/radeon_object.c:83 radeon_ttm_bo_destroy+0xe7/0xf0 [radeon]() ``` My system is running: vlc 3.0.0~~git20151123+r62463+34~ubuntu15.10.1 linux-image-4.3.0-040300-generic 4.3.0-040300.201511020949 libdrm-radeon1 2.4.65+git1511161830.8913cd~gd~w xserver-xorg-video-radeon 7.6.99+git1511170732.10b7c3~gd~w libgl1-mesa-dri 11.2~git1511231930.e4c122~gd~w mesa-vdpau-drivers 11.2~git1511231930.e4c122~gd~w That is a real issue but it's not the topic of this ticket. The really big problem is this bug almost burned my CPU. I explain. When the bug occurred, I tried to track it. Instead of rebooting my computer I started a laptop in order to connect to my computer using ssh, and to diagnose some stuff on the living system. While the laptop were booting, I took some photo of my screen. But suddenly, my computer shutdown itself. The CPU critical temperature was reached. Normal operation temperature is normally between 30°C and 40°C on my system. In case of emergency, I have two regulators running on my computer. The first one raises fan speed from 128 tr/min to 1400 tr/min when temperature reaches 50°C, and the second one downclocks all the 8 core from 4.7 GHz to 1.4GHz when the temperature reaches 70°C. Both regulators are userspace regulators. The first is the well-known fancontrol, and the other one is mine. Both works well (if I use cpuburn for example). The fact is, when the GPU lockup occurred, something from the driver goes wrong on the CPU side. It looks like some infinite loop started on my cores, doing some extensive tasks, probably without having to deal with external components (like central memory unit) in order to never slow done the CPU. In fact, the computer acted exactly like if I was running one cpuburn process per core using performance cpu governor during a summer noon. But there was an exception, the fan never accelerated (so it was still running at 128 tr/min when the CPU reached 90°, and the cpu was never downclocked too. That's why I wrote this issue. When this bug occured, the system goes so wrong the CPU was on knees and no regulator was able to control the CPU fan so the CPU endlessly heating. Hopefully, the internal CPU temperature protection shutdown automatically my computer to save itself. But if someone use a CPU with a faulty temperature safety mechanisme, this GPU lockup can lead to a CPU burn for realâ¯! -- You are receiving this mail because: You are the assignee for the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20151125/c6b5dda9/attachment.html>