Ok. So, sorry about all the back and forth. Partially this is because
I'm more familiar with thermald than others on the SRU team, and so
don't necessarily make things explicit that should be.

At a high level, what the SRU team (in general, so that *I* don't have
to be the single point of failure) is looking for is:

*) What is the scope of potential regressions - what hardware *could* this 
effect, what possible effects could it have
  - The answer to “what hardware” is: the list of CPUIDs in 
tdh_engine.cpp:id_table. This is SandyBridge onwards
  - The answer to “what effects” is: temperature throttling problems - either 
reduced performance of CPU (and GPU?) due to unnecessary throttling, or 
instability due to not controlling temperatures <FEEL FREE TO EXPAND HERE>

*) What is the scope of *upstream* support - what systems do *they* test on, 
and expect to continue to work.
  - Relatedly: what testing does upstream do
  - What do we do if upstream doesn't test on hardware that we support (ie: 
*we* care about all the hardware)

*) What is the process we are going to use to verify that upstream doesn't drop 
support for systems?
  - Upstream doesn't seem to make it very easy to identify this
  - eg: the current SRU includes dropping the MSR poking support. What process 
do we/will we have to catch such cases?

*) What is the process for testing that an upload does not regress
  - The [Test Case] above is good for systems with KBL or newer processors
  - thermald also supports systems from SandyBridge onwards - how are we 
testing these? These are still supported by Ubuntu; we need a testing system 
more than “maybe users will report regressions”, particularly since it's not 
necessarily going to be clear to users that “my system got slower” is related 
to the thermald update.

Most of those questions are covered above, I think, but some could do
with your input.

Particularly the SandyBridge+ question is an important one to answer.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to thermald in Ubuntu.
https://bugs.launchpad.net/bugs/1995606

Title:
  Upgrade thermald to 2.5.1

Status in thermald package in Ubuntu:
  Fix Released
Status in thermald source package in Jammy:
  Incomplete

Bug description:
  [Justification]
  The purpose of this bug is that prevent the regression in the future.
  The automatic test scripts are better for the future SRU and is still on the 
planning.

  [Test case]
  For each supported CPU series (RPL/ADL/TGL/CML/CFL/KBL) the following tests 
will be run on machines in the CI lab:

  1. Run stress-ng, and observe the temperature/frequency/power with s-tui
    - Temperatures should stay just below trip values
    - Power/performance profiles should stay roughly the same between old 
thermald and new thermald (unless specifically expected eg: to fix 
premature/insufficient throttling)
  2. check if thermald could read rules from /dev/acpi_thermal_rel and generate 
the xml file on /etc/thermald/ correctly.
    - this depends on if acpi_thermal_rel exist.
    - if the machine suppots acpi_thermal_rel, the "thermal-conf.xml.auto"
   could be landed in etc/thermald/.
    - if not, the user-defined xml could be created, then jump to (3).
    - run thermald with --loglevel=debug, and compare the log with xml.auto 
file. check if the configuration could be parsed correctly.
  3. check if theramd-conf.xml and thermal-cpu-cdev-order.xml can be loaded 
correctly.
    - run thermald with --loglevel=debug, and compare the log with xml files.
    - if parsed correctly, the configurations from XML files would appear in 
the log.

  4. Run unit tests, the scripts are under test folder, using emul_temp to 
simulate the High temperatue and check thermald would throttle CPU through the 
related cooling device.
    - rapl.sh
    - intel_pstate.sh
    - powerclamp.sh
    - processor.sh
  5. check if the power/frequency would be throttled once the temperature reach 
the trip-points of thermal zone.
  6. check if system would be throttled even the temperature is under the 
trip-points.

  [ Where problems could occur ]
  since the PL1 min/max is introduced, we may face the edge case in the future.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1995606/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to