Hello Ben,
Thank you for your reply.
I've made the fixes and re-run the tests on Grace, as you advised.
I appreciate your feedback.
> This is only guaranteed to clean and invalidate to the point of
> coherence, PoC. On Grace I expect this is L3/slc and so the cache line
> there in L3/slc is likely not invalidated or pushed to DRAM.
> The dsb() for synchronization is missing for aarch64 in sb().
I added dsb() for synchronization for aarch64 as shown below.
@@ -27,6 +30,8 @@ static void sb(void)
#if defined(__i386) || defined(__x86_64)
asm volatile("sfence\n\t"
: : : "memory");
+#elif defined(__aarch64__)
+ __asm__ __volatile__("dsb sy\n\t" ::: "memory");
#endif
}
> IIUC the L3 cache is in the nvidia interconnect and so changing the
> cache portion bitmap would correlate with events from the nvidia
> interconnect pmu. However, I don't think you are using events from the
> interconnect.
I used the NVIDIA event "nvidia_scf_pmu/scf_cache_refill/".
After the above fixes, the running results are as follows:
$ sudo ./resctrl_tests -t cat
TAP version 13
# Pass: Check kernel supports resctrl filesystem
# Pass: Check resctrl mountpoint "/sys/fs/resctrl" exists
# resctrl filesystem not mounted
1..3
# Starting L3_CAT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :119537664
# Writing benchmark parameters to resctrl FS
# Write schema "L3:1=fc0" to resctrl FS
# Write schema "L3:1=3f" to resctrl FS
# Write schema "L3:1=fe0" to resctrl FS
# Write schema "L3:1=1f" to resctrl FS
# Write schema "L3:1=ff0" to resctrl FS
# Write schema "L3:1=f" to resctrl FS
# Write schema "L3:1=ff8" to resctrl FS
# Write schema "L3:1=7" to resctrl FS
# Write schema "L3:1=ffc" to resctrl FS
# Write schema "L3:1=3" to resctrl FS
# Write schema "L3:1=ffe" to resctrl FS
# Write schema "L3:1=1" to resctrl FS
# Checking for pass/fail
# Number of bits: 6
# Average LLC val: 0
# Cache span (lines): 933888
# Number of bits: 5
# Average LLC val: 0
# Cache span (lines): 778240
# Number of bits: 4
# Average LLC val: 0
# Cache span (lines): 622592
# Number of bits: 3
# Average LLC val: 0
# Cache span (lines): 466944
# Number of bits: 2
# Average LLC val: 0
# Cache span (lines): 311296
# Number of bits: 1
# Average LLC val: 0
# Cache span (lines): 155648
ok 1 L3_CAT: test
The result of the nvidia_scf_pmu/scf_cache_refill event is 0.
I have tried various changes to the perf_event_open() parameters, such as type,
read_format, PID etc..
Although non-zero results were obtained for some parameter combinations, the
expected results were not achieved in any scenario.
Are there any special specifications needed for the perf_event_open()
parameters for Grace or Arm architecture?
The perf_event_open() parameters used when collecting the above results are as
follows:
perf_event_open({type=PERF_TYPE_RAW, size=0x88 /* PERF_ATTR_SIZE_??? */,
config=0xf1, sample_period=0, sample_type=PERF_SAMPLE_IDENTIFIER,
read_format=PERF_FORMAT_GROUP, disabled=1, inherit=1, exclude_kernel=1,
exclude_hv=1, precise_ip=0 /* arbitrary skid */, exclude_guest=1,
exclude_callchain_kernel=1, ...}, 68508, 1, -1, PERF_FLAG_FD_CLOEXEC) = 3
Could you please give us your opinion?
Also, since this kselftest is for all Arm chips, we need an event common to all
chips.
Do you have any ideas on what event we should collect?
Best regards,
Shaopeng TAN