resctrl: Adjust SNC support messages

Reinette Chatre Wed, 03 Jul 2024 13:51:18 -0700

Hi Maciej,

On 7/3/24 12:43 AM, Maciej Wieczór-Retman wrote:

On 3.07.2024 00:21, Reinette Chatre wrote:

On 7/1/24 7:18 AM, Maciej Wieczor-Retman wrote:

diff --git a/tools/testing/selftests/resctrl/cache.c 
b/tools/testing/selftests/resctrl/cache.c
index 1ff1104e6575..9885d64b8a21 100644
--- a/tools/testing/selftests/resctrl/cache.c
+++ b/tools/testing/selftests/resctrl/cache.c
@@ -186,4 +186,7 @@ void show_cache_info(int no_of_bits, __u64 avg_llc_val, 
size_t cache_span, bool
       ksft_print_msg("Average LLC val: %llu\n", avg_llc_val);
       ksft_print_msg("Cache span (%s): %zu\n", lines ? "lines" : "bytes",
                  cache_span);
+    if (snc_unreliable)
+        ksft_print_msg("SNC detection unreliable due to offline CPUs!\n");


The message abour SNC detection being unreliable is already printed at 
beginning of every
test so I do not think it is necessary to print it again at this point.


The "SNC detection was unreliable" only gets printed on the first execution of 
run_single_test().


There is more about this later, but this can be printed at start of each test.

That's what the global snc_mode was for mostly, it starts initialized to 0, and 
then on the first
run of run_single_test() it is set so other tests don't call get_snc_mode(). 
And then the local static
variable inside get_snc_mode() prevents the detection from running more than 
once when other places
call get_snc_mode() (like when the cache size is adjusted).


The shadowing of variables can get confusing. I think the global snc_mode is 
not necessary, having the
local static variable within snc_nodes_per_l3_cache() should be sufficient and 
run_single_test()
can just do a:

        int snc_mode; /* new name welcome */

        snc_mode = snc_nodes_per_l3_cache();
        if (snc_mode > 1)
                ksft_print_msg("SNC-%d mode discovered\n", snc_mode);
        else if (snc_unreliable)
                ksft_print_msg("SNC detection unreliable due to offline CPUs. Test 
results may not be accurate if SNC enabled.\n");


And as we discussed last time it's beneficial to put error messages at the end 
of the test in case the
user misses the initial warning at the very beginning.


Right. What I found unexpected was that it is done "at the end" but from two 
places, from the show*info()
as well as from run*test(). I expect "the end" to be a single place.

   }
diff --git a/tools/testing/selftests/resctrl/cmt_test.c 
b/tools/testing/selftests/resctrl/cmt_test.c
index 0c045080d808..588543ae2654 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -175,8 +175,8 @@ static int cmt_run_test(const struct resctrl_test *test, 
const struct user_param
           goto out;
         ret = check_results(&param, span, n);
-    if (ret && (get_vendor() == ARCH_INTEL))
-        ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is 
enabled. Check BIOS configuration.\n");


This message does seem to still be applicable if snc_unreliable == 1.


I was going for this one error message to specifically catch the kernel
not having snc support for resctrl while snc is enabled. While the
above message could be true when snc_unreliable == 1, it doesn't have to.


If a test fails when snc_unreliable == 1 then nothing is certain and some 
generic message
is needed.

SNC might not be enabled at all so there would be no reason to send the user
to check their BIOS - instead they can learn they have offline CPUs and they can
work on fixing that. In my opinion it could be beneficial to have more 
specialized
messages in the selftests to help users diagnose problems quicker.


My goal is indeed to have specialized messages. There cannot be a specialized 
message
if snc_reliable == 1. In this case it needs to be generic since SNC may or may 
not be
enabled and it is up to the user to investigate further.


Having only this one message wihtout the "if snc unreliable" messages would
mean nothing would get printed at the end on success with unreliable SNC 
detection.


Having a pass/fail is what user will focus on. If the test passes then SNC 
detection
should not matter. The messages are just there to help user root cause where a 
failure
may be.

...

   volatile int *value_sink = &sink_target;
     static struct resctrl_test *resctrl_tests[] = {
@@ -123,6 +124,12 @@ static void run_single_test(const struct resctrl_test 
*test, const struct user_p
       if (test->disabled)
           return;
   +    if (!snc_mode) {
+        snc_mode = get_snc_mode();
+        if (snc_mode > 1)



 From what I can tell this is the only place the global is used and this can 
just be:
         if (get_snc_mode() > 1)


I wanted to print the message below only on the first call to run_single_test() 
and then
print relevant warnings at the very end of each test. I thought that was your 
intention
when we discussed what messages are supposed to be printed and when in v2 of 
this series.

Do you think it would be better to just print this message at the start of each 
test?


Yes. If there is a problem with a test the user could be expected to start 
tracing back
messages printed from beginning of failing test.

Or should I make "snc_mode" into local static inside run_single_test()? Or 
maybe add
a second local static variable into get_snc_mode() that would control whether 
or not
the message should be printed?


I do not see where more local static variables may be needed.

Reinette

Re: [PATCH v3 2/2] selftests/resctrl: Adjust SNC support messages

Reply via email to