On 3/4/2026 8:19 AM, Reinette Chatre wrote:
Changes since v1:
- The new perf interface that resctrl selftests can utilize has been accepted
and
merged into v7.0-rc2. This series can thus now be considered for inclusion.
For reference,
commit 6a8a48644c4b ("perf/x86/intel/uncore: Add per-scheduler IMC CAS count
events")
The resctrl selftest changes making use of the new perf interface are
backward
compatible. The selftests do not require a v7.0-rc2 kernel to run but the
tests can only pass on recent Intel platforms running v7.0-rc2 or later.
- Combine the two outstanding resctrl selftest submissions into one series
for easier tracking:
https://lore.kernel.org/lkml/084e82b5c29d75f16f24af8768d50d39ba0118a5.1769101788.git.reinette.cha...@intel.com/
https://lore.kernel.org/lkml/[email protected]/
- Fix typo in changelog of "selftests/resctrl: Improve accuracy of cache
occupancy test": "the data my be in L2" -> "the data my be in L2"
- Add Zide Chen's RB tags.
Cover letter updated to be accurate wrt perf changes:
The resctrl selftests fail on recent Intel platforms. Intermittent failures
in the CAT test and permanent failures of MBM and MBA tests on new platforms
like Sierra Forest and Granite Rapids.
The MBM and MBA resctrl selftests both generate memory traffic and compare the
memory bandwidth measurements between the iMC PMUs and MBM to determine pass or
fail. Both these tests are failing on recent platforms like Sierra Forest and
Granite Rapids that have two events that need to be read and combined
for a total memory bandwidth count instead of the single event available on
earlier platforms.
resctrl selftests prefer to obtain event details via sysfs instead of adding
model specific details on which events to read. Enhancements to perf to expose
the new event details are available since:
commit 6a8a48644c4b ("perf/x86/intel/uncore: Add per-scheduler IMC CAS count
events")
This series demonstrates use of the new sysfs interface to perf to
obtain to obtain accurate iMC read memory bandwidth measurements.
An additional issue with all the tests is that these selftests are part
performance tests and determine pass/fail on performance heuristics selected
after running the tests on a variety of platforms. When new platforms
arrive the previous heuristics may cause the tests to fail. These failures are
not because of an issue with the resctrl subsystem the tests intend to test
but because of the architectural changes in the new platforms.
Adapt the resctrl tests to not be as sensitive to architectural changes
while adjusting the remaining heuristics to ensure tests pass on a variety
of platforms. More details in individual patches.
Tested by running 100 iterations of all tests on Emerald Rapids, Granite
Rapids, Sapphire Rapids, Ice Lake, Sierra Forest, and Broadwell.
Tested on a GNR with SNC3, without this patch I saw MBM/MBA failures.
With this patch applied on v7.0-rc2, I did not see any errors:
sudo ./resctrl_tests
TAP version 13
# Pass: Check kernel supports resctrl filesystem
# Pass: Check resctrl mountpoint "/sys/fs/resctrl" exists
# resctrl filesystem not mounted
# dmesg: [ 16.192737] resctrl: Sub-NUMA Cluster mode detected with 3
nodes per L3 cache
# dmesg: [ 16.287785] resctrl: L3 allocation detected
# dmesg: [ 16.287961] resctrl: L2 allocation detected
# dmesg: [ 16.288093] resctrl: MB allocation detected
# dmesg: [ 16.288186] resctrl: L3 monitoring detected
1..6
# SNC-3 mode discovered.
# Starting MBM test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Writing benchmark parameters to resctrl FS
# Benchmark PID: 5503
# Write schema "MB:0=100" to resctrl FS
# Checking for pass/fail
# Pass: Check MBM diff within 15%
# avg_diff_per: 2%
# Span (MB): 640
# avg_bw_imc: 8815
# avg_bw_resc: 8570
ok 1 MBM: test
# Starting MBA test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Writing benchmark parameters to resctrl FS
# Benchmark PID: 5506
# Write schema "MB:0=10" to resctrl FS
# Write schema "MB:0=20" to resctrl FS
# Write schema "MB:0=30" to resctrl FS
# Write schema "MB:0=40" to resctrl FS
# Write schema "MB:0=50" to resctrl FS
# Write schema "MB:0=60" to resctrl FS
# Write schema "MB:0=70" to resctrl FS
# Write schema "MB:0=80" to resctrl FS
# Write schema "MB:0=90" to resctrl FS
# Write schema "MB:0=100" to resctrl FS
# Results are displayed in (MB)
# Bandwidth below threshold (2500 MiB). Dropping results from MBA
schemata 10.
# Bandwidth below threshold (2500 MiB). Dropping results from MBA
schemata 20.
# Bandwidth below threshold (2500 MiB). Dropping results from MBA
schemata 30.
# Bandwidth below threshold (2500 MiB). Dropping results from MBA
schemata 40.
# Bandwidth below threshold (2500 MiB). Dropping results from MBA
schemata 50.
# Pass: Check MBA diff within 15% for schemata 60
# avg_diff_per: 6%
# avg_bw_imc: 4669
# avg_bw_resc: 4355
# Pass: Check MBA diff within 15% for schemata 70
# avg_diff_per: 5%
# avg_bw_imc: 5556
# avg_bw_resc: 5231
# Pass: Check MBA diff within 15% for schemata 80
# avg_diff_per: 5%
# avg_bw_imc: 6257
# avg_bw_resc: 5942
# Pass: Check MBA diff within 15% for schemata 90
# avg_diff_per: 4%
# avg_bw_imc: 7126
# avg_bw_resc: 6804
# Pass: Check MBA diff within 15% for schemata 100
# avg_diff_per: 3%
# avg_bw_imc: 9246
# avg_bw_resc: 8901
# Pass: Check schemata change using MBA
ok 2 MBA: test
# Starting CMT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :167772160
# Writing benchmark parameters to resctrl FS
# Write schema "L3:0=ffe0" to resctrl FS
# Write schema "L3:0=1f" to resctrl FS
# Write schema "L2:1=0x1" to resctrl FS
# Benchmark PID: 5508
# Checking for pass/fail
# Pass: Check cache miss rate within 15%
# Percent diff=0
# Number of bits: 5
# Average LLC val: 52264960
# Cache span (bytes): 52428800
ok 3 CMT: test
# Starting L3_CAT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :167772160
# Writing benchmark parameters to resctrl FS
# Write schema "L2:1=0x1" to resctrl FS
# Write schema "L3:0=3f80" to resctrl FS
# Write schema "L3:0=7f" to resctrl FS
# Write schema "L3:0=3fe0" to resctrl FS
# Write schema "L3:0=1f" to resctrl FS
# Write schema "L3:0=3ff8" to resctrl FS
# Write schema "L3:0=7" to resctrl FS
# Write schema "L3:0=3ffe" to resctrl FS
# Write schema "L3:0=1" to resctrl FS
# Checking for pass/fail
# Number of bits: 7
# Average LLC val: 620490
# Cache span (lines): 1146880
# Pass: Check cache miss rate increased
# Number of bits: 5
# Average LLC val: 1149986
# Cache span (lines): 819200
# Pass: Check cache miss rate increased
# Number of bits: 3
# Average LLC val: 1604363
# Cache span (lines): 491520
# Pass: Check cache miss rate increased
# Number of bits: 1
# Average LLC val: 2285082
# Cache span (lines): 163840
ok 4 L3_CAT: test
# Starting L3_NONCONT_CAT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Write schema "L3:0=ff" to resctrl FS
# Write schema "L3:0=fc3f" to resctrl FS
ok 5 L3_NONCONT_CAT: test
# Starting L2_NONCONT_CAT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Write schema "L2:1=ff" to resctrl FS
# Write schema "L2:1=fc3f" to resctrl FS
ok 6 L2_NONCONT_CAT: test
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
Tested-by: Chen Yu <[email protected]>
thanks,
Chenyu