On Tue, Apr 15, 2025 at 09:14:36PM -0400, Joel Fernandes wrote: > > > On 4/15/2025 5:15 PM, Paul E. McKenney wrote: > > On Tue, Apr 15, 2025 at 10:59:36AM -0700, Paul E. McKenney wrote: > >> On Tue, Apr 15, 2025 at 01:16:15PM -0400, Joel Fernandes wrote: > >>> > >>> > >>> On 3/31/2025 5:03 PM, Paul E. McKenney wrote: > >>>> This commit adds a new rcutorture.n_up_down kernel boot parameter > >>>> that specifies the number of outstanding SRCU up/down readers, which > >>>> begin in kthread context and end in an hrtimer handler. There is a new > >>>> kthread ("rcu_torture_updown") that scans an per-reader array looking > >>>> for elements whose readers have ended. This kthread sleeps between one > >>>> and two milliseconds between consecutive scans. > >>>> > >>>> [ paulmck: Apply kernel test robot feedback. ] > >>>> [ paulmck: Apply Z qiang feedback. ] > >>>> > >>>> Signed-off-by: Paul E. McKenney <paul...@kernel.org> > >>> > >>> For completeness, posting our discussion for the archives, an issue > >>> exists in > >>> this patch causing the following errors on an ARM64 machine with 288 CPUs: > >>> > >>> When running SRCU-P test, we intermittently see: > >>> > >>> [ 9500.806108] ??? Writer stall state RTWS_SYNC(21) g18446744073709551218 > >>> f0x0 > >>> ->state 0x2 cpu 4 > >>> [ 9515.833356] ??? Writer stall state RTWS_SYNC(21) g18446744073709551218 > >>> f0x0 > >>> ->state 0x2 cpu 4 > >>> > >>> It bisected to just this patch. > >> > >> Looks like your getting rcutorture running on ARM was well timed! > > Yes! Glad I could help. > > > And could you please send along your dmesg and .config files? > > Sure, attached both for one of the failed runs.
Thank you! That did answer at least one of my questions. It also showed the need for the diff below. :-/ As in kvm.sh and friends might well be missing failures in your runs. Thanx, Paul ------------------------------------------------------------------------ diff --git a/tools/testing/selftests/rcutorture/bin/console-badness.sh b/tools/testing/selftests/rcutorture/bin/console-badness.sh index aad51e7c0183d..991fb11306eb6 100755 --- a/tools/testing/selftests/rcutorture/bin/console-badness.sh +++ b/tools/testing/selftests/rcutorture/bin/console-badness.sh @@ -10,7 +10,7 @@ # # Authors: Paul E. McKenney <paul...@kernel.org> -grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' | +grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Call trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' | grep -v 'ODEBUG: ' | grep -v 'This means that this is a DEBUG kernel and it is' | grep -v 'Warning: unable to open an initial console' | diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh index b07c11cf6929d..21e6ba3615f6a 100755 --- a/tools/testing/selftests/rcutorture/bin/parse-console.sh +++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh @@ -148,7 +148,7 @@ then summary="$summary KCSAN: $n_kcsan" fi fi - n_calltrace=`grep -c 'Call Trace:' $file` + n_calltrace=`grep -Ec 'Call Trace:|Call trace:' $file` if test "$n_calltrace" -ne 0 then summary="$summary Call Traces: $n_calltrace"