Hello all, I have a Dell Latitude E7440 running Ubuntu 15.04 which seems to be suffering from the intel_pstate driver getting stuck in a throttled state while under load. The issue typically occurs on warm days when the while the machine is under load for an extended period of time (e.g. while compiling).
Under these conditions performance gradually deteriorates as the CPU frequency creeps lower and lower. In this dmesg log [1] from a recent incident, we see that there were a couple core and package throttling events. This in itself isn't problematic; what is troubling is that despite the fact that the temperature quickly returned to normal, the CPU frequency remained at just below 400 MHz for the next hour or so while I gathered data on the issue with the system under load. The temperature was a a stable low-60 degrees Celcius for this duration. After I finished gathering data I killed the CPU-intensive process and it took over ten minutes for frequency scaling to behave normally again, eventually scaling up to 3.3 GHz when necessary.. I experience these sorts of events fairly regularly when placing the machine under load. It seems to make no difference whether I use the powersave or performance governor. This is strange as most accounts I have seen claim that the performance governor unconditionally sets the CPU frequency at its maximum frequency. Even if there were a thermal limit the system temperature in this case isn't terribly unreasonable (60 to 65 degrees Celcius). I've attached some further information gathered during the incident, which occurred with a 4.2-rc5 kernel, although I have been experiencing issues of this nature ever since I bought the machine (mostly in the summer). How would one further trace down this issue? The kernel tree seems to be rather lacking in documentation describing what factors enter intel_pstate's scaling decisions. Is there any way to get better visibility into this process? Any ideas on what might be going wrong here? Cheers, - Ben [1] https://gist.github.com/bgamari/ae032532a13fa52a8a69 $ cpupower monitor |Nehalem || SandyBridge || HaswellExtended || Mperf || Idle_Stats CPU | C3 | C6 | PC3 | PC6 || C7 | PC2 | PC7 || PC8 | PC9 | PC10 || C0 | Cx | Freq || POLL | C1-H | C1E- | C3-H | C6-H | C7s- | C8-H | C9-H | C10- 0| 7.04| 5.22| 0.00| 0.00|| 31.01| 18.16| 0.00|| 0.00| 0.00| 0.00|| 40.21| 59.79| 388|| 0.00| 0.04| 0.57| 5.08| 3.23| 13.00| 8.80| 29.25| 0.00 2| 7.04| 5.22| 0.00| 0.00|| 31.01| 18.16| 0.00|| 0.00| 0.00| 0.00|| 27.59| 72.41| 379|| 0.00| 0.01| 0.20| 7.76| 5.16| 17.02| 19.51| 21.57| 1.15 1| 3.59| 2.92| 0.00| 0.00|| 41.40| 18.16| 0.00|| 0.00| 0.00| 0.00|| 32.14| 67.86| 394|| 0.00| 0.01| 0.26| 5.21| 4.30| 24.69| 6.45| 24.22| 2.83 3| 3.59| 2.92| 0.00| 0.00|| 41.40| 18.16| 0.00|| 0.00| 0.00| 0.00|| 26.58| 73.42| 367|| 0.00| 0.00| 0.11| 1.87| 1.14| 30.36| 5.54| 32.62| 1.95 $ cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: 0.97 ms. hardware limits: 800 MHz - 3.30 GHz available cpufreq governors: performance, powersave current policy: frequency should be within 800 MHz and 3.30 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency is 380 MHz (asserted by call to hardware). boost state support: Supported: yes Active: yes $ sensors acpitz-virtual-0 Adapter: Virtual device temp1: +25.0C (crit = +107.0C) coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +63.0C (high = +100.0C, crit = +100.0C) Core 0: +62.0C (high = +100.0C, crit = +100.0C) Core 1: +63.0C (high = +100.0C, crit = +100.0C) dell_smm-virtual-0 Adapter: Virtual device Processor Fan: 6710 RPM CPU: +62.0C Ambient: +49.0C SODIMM: +52.0C $ cd /sys/devices/system/cpu/intel_pstate $ cat {max,min}_perf_pct 100 100 $ cat no_turbo num_pstates turbo_pct 0 26 24 $ cd /sys/kernel/debug/pstate_snb $ cat pgain_pct 20 $ cat igain_pct 0 $ cat dgain_pct 0 $ cd ../pkg_temp_thermal $ cat pkg_thres_* 0 0 $ cd ../intel_powerclamp $ cat powerclamp_calib controlling cpu: 0 pct confidence steady dynamic (compensation) 0 0 0 0 1 0 0 0 2 0 0 0 ... (remaining lines also all zeros) $ sudo turbostat CPU Avg_MHz %Busy Bzy_MHz TSC_MHz - 175 47.37 369 2694 0 210 55.23 380 2698 2 219 61.70 354 2693 1 139 36.09 385 2692 3 131 36.45 360 2694 CPU Avg_MHz %Busy Bzy_MHz TSC_MHz - 167 45.63 365 2696 0 108 28.06 385 2695 2 314 89.24 352 2698 1 130 33.43 388 2698 3 115 31.75 364 2694 CPU Avg_MHz %Busy Bzy_MHz TSC_MHz - 174 46.53 373 2694 0 176 45.48 386 2696 2 200 55.75 360 2694 1 179 46.42 385 2694 3 139 38.47 362 2694 $ cpupower idle-info CPUidle driver: intel_idle CPUidle governor: menu Analyzing CPU 0: Number of idle states: 9 Available idle states: POLL C1-HSW C1E-HSW C3-HSW C6-HSW C7s-HSW C8-HSW C9-HSW C10-HSW POLL: Flags/Description: CPUIDLE CORE POLL IDLE Latency: 0 Usage: 19629 Duration: 4903415 C1-HSW: Flags/Description: MWAIT 0x00 Latency: 2 Usage: 12066075 Duration: 2316078427 C1E-HSW: Flags/Description: MWAIT 0x01 Latency: 10 Usage: 1437624 Duration: 497058866 C3-HSW: Flags/Description: MWAIT 0x10 Latency: 33 Usage: 1664168 Duration: 916288273 C6-HSW: Flags/Description: MWAIT 0x20 Latency: 133 Usage: 456853 Duration: 353643717 C7s-HSW: Flags/Description: MWAIT 0x32 Latency: 166 Usage: 1714991 Duration: 1671456695 C8-HSW: Flags/Description: MWAIT 0x40 Latency: 300 Usage: 1435877 Duration: 1966505031 C9-HSW: Flags/Description: MWAIT 0x50 Latency: 600 Usage: 1565954 Duration: 3739218646 C10-HSW: Flags/Description: MWAIT 0x60 Latency: 2600 Usage: 118301 Duration: 955949684
signature.asc
Description: PGP signature