I just checked old and new behavior again.
First of all it's not clear for me from where this value of 53 is taken, 
according to the data in /sys/class/thermal/thermal_zone5 (THP)
trip_point_0_hyst:4000
trip_point_0_temp:-274000
trip_point_0_type:passive
trip_point_1_hyst:4000
trip_point_1_temp:-274000
trip_point_1_type:passive
trip_point_2_hyst:4000
trip_point_2_temp:80050
trip_point_2_type:critical
trip_point_3_hyst:4000
trip_point_3_temp:75050
trip_point_3_type:hot
trip_point_4_hyst:4000
trip_point_4_temp:65050
trip_point_4_type:passive
and the same values are shown for THP by old thermald (via ThermalMonitor):
THP:
65 Passive 
75 Max
80 Critical
58 Polling

For new thermald with --adaptive THP values (via ThermalMonitor) are the 
following:
52 Passive
46 Polling
and doesn't comply with data from /sys/class/thermal/thermal_zone5/

I'll be grateful if you could explain me what I missed, but I believe
that it could be so -- at least it looks like thermald tries to maintain
53 degree value.

But regarding my issue I can tell you some measurements:
With old thermald zone5 (THP) temperature is kept 53-56 during the load, CPU 
temperature 60-64, CPU frequency 1.3-1.4
with new thermald zone5 (THP) temperature is also kept 53-56 during the load, 
CPU temperature 70-74 CPU frequency 2.4 (up to 2.7) and amount of noise is 
bigger.

So it looks like thermald tries to keep THP temperature within limits by
decreasing the GPU frequency, but as far as GPU chip is on the same die
with CPU and shares the cooling fan, thermald is not succeed -- because
the CPU heats the GPU chip.

So my biggest concern and suspicion here is that new thermald (with --adaptive) 
sacrifices GPU by CPU -- it makes GPU run on lower frequencies in favor of 
allowing CPU run on higher ones. Maybe it is reasonable for some loads (I can 
hardly imagine for which ones?) but not for end-user laptop and especially for 
gaming/3D or 4K accelerated video from Youtube -- if there is a high demand for 
GPU it means that it would mostly affect performance, latency and overall 
end-user experience.
Tigerlake laptops have more than enough CPU performance for end-user but a very 
weak GPU and even this GPU is throttled in case of serious load.

I understand that it would be better to have some option, what to
prioritize GPU or CPU, but for majority of end-users (and this Tigerlake
is not a tool for ones who want to run a heavy computing tasks) the GPU
prioritization would be more desired and expected.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1944389

Title:
  Thermald 1.9.1-1ubuntu0.6 keeps Tigerlake GPU frequency on 400 MHz

Status in thermald package in Ubuntu:
  New

Bug description:
  After update to 1.9.1-1ubuntu0.6 from 1.9.1-1ubuntu0.4 thermald keeps 
Tigerlake Iris Xe GPU frequency on 400 MHz after reaching some high temperature 
value. It became impossible to play video games on the laptop.
  System: Ubuntu 20.04.3 LTS
  Kernel: 5.10.0-1045-oem
  Laptop: Dell XPS 9310, CPU: Intel Core i7-1165G7 (Tigerlake), Integrated GPU 
Iris Xe.
  BIOS 2.1.1 03/25/2021
  Display: 2560x1440 144Hz HDMI USB-C connection
  Room temperature is 23.5-25.4 degrees
  Game: Stalker Clear Sky (Wine/Proton Steam)
  Note: The game itself is very old and loads 100% of one CPU core disregarding 
of frequency.

  GPU frequency is monitored by intel-gpu-top
  GPU frequencies (according to /sys/class/drm/card0/) 
min/max/boost/efficiency: 100/1300/1300/400

  Previous behavior (1.9.1-1ubuntu0.4)
  After starting the game at first GPU reaches the boost value of 1300 MHz and 
CPU/package temperatures continuously increase. At this point game renders at 
~80FPS. 
  After some time when threshold temperature value (~78 degrees) is reached the 
GPU frequency decreases to ~660 MHz and FPS to 40-48 FPS. Package temperature 
decreases to 66-68 degrees. It's possible to play for indefinite amount of time.

  New behavior (1.9.1-1ubuntu0.4)
  After starting the game at first GPU reaches the boost value of 1300 MHz and 
CPU/package temperatures continuously increase. At this point game renders at 
~80FPS.
  But after reaching the threshold temperature (about 81 degrees) GPU frequency 
decreases to 400 MHz (gt_RP1_freq_mhz -- "efficiency" temperature for the GPU) 
and stays on this value for the indefinite amount of time. The temperature is 
maintained on 70-74 degrees. FPS is about 25-30 FPS, it is not possible to play 
the game anymore. The only way to return the good FPS and frequency is to fold 
the game window, wait some time and open it again.

  
  Also there is a workaround -- limit the CPU frequency to 2001 MHz and disable 
Intel turbo boost. With such approach package temperature never reaches 80 
degrees and it is possible to play game with 500 MHz and 35-40 FPS. Better than 
nothing.

  
  If it is needed I can perform any additional checks, provide CPU frequencies 
and so on. Most probably regression happened with 1.9.1-1ubuntu0.5, but I tried 
only versions 0.4 and 0.6

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1944389/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to