On Sat, Apr 05, 2025 at 05:54:00PM +0000, Joel Fernandes wrote: > > On Apr 5, 2025, at 1:23 PM, Paul E. McKenney <paul...@kernel.org> wrote: > > > > On Sat, Apr 05, 2025 at 12:30:58PM -0000, Joel Fernandes wrote: > >> Hello, Paul, > >> > >>> On Sat, 5 Apr 2025 12:26:12 GMT, "Paul E. McKenney" wrote: > >>> On Sat, Mar 29, 2025 at 07:01:42PM -0400, Joel Fernandes wrote: > >>>> Currently, the ->gpwrap is not tested (at all per my testing) due to the > >>>> requirement of a large delta between a CPU's rdp->gp_seq and its node's > >>>> rnp->gpseq. > >>>> > >>>> This results in no testing of ->gpwrap being set. This patch by default > >>>> adds 5 minutes of testing with ->gpwrap forced by lowering the delta > >>>> between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is > >>>> configurable, including the active time for the setting and a full > >>>> testing cycle. > >>>> > >>>> By default, the first 25 minutes of a test will have the _default_ > >>>> behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes, > >>>> we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe > >>>> this is reasonable since we at least add a little bit of testing for > >>>> usecases where ->gpwrap is set. > >>>> > >>>> Signed-off-by: Joel Fernandes <joelagn...@nvidia.com> > >>> > >>> I ran this as follows: > >>> > >>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10m > >>> --configs "TREE01" --bootargs "rcutorture.ovf_cycle_mins=7" --trust-make > >>> > >>> Once I actually applied your patch, I did get this: > >>> > >>> [ 601.891042] gpwraps: 13745 > >>> > >>> Which seems to indicate some testing. ;-) > >> > >> Thanks a lot for running it. I am wondering if I should check in tree.c > >> (only in > >> testing mode), if the wraps are too many and restrict testing if so. > >> Otherwise, > >> it is hard to come up with a constant that ensures the wraps are under > >> control. > >> On the other hand, since this is only for 5 minutes every 30 minutes, we > >> can leave > >> it as is and avoid the complexity. > > > > I don't (yet) see a problem with lots of wraps. > > > >>> Additional comments inline. > >>> > >>> Thanx, Paul > >>> > >>>> --- > >>>> kernel/rcu/rcu.h | 4 +++ > >>>> kernel/rcu/rcutorture.c | 64 +++++++++++++++++++++++++++++++++++++++++ > >>>> kernel/rcu/tree.c | 34 ++++++++++++++++++++-- > >>>> kernel/rcu/tree.h | 1 + > >>>> 4 files changed, 101 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > >>>> index eed2951a4962..9a15e9701e02 100644 > >>>> --- a/kernel/rcu/rcu.h > >>>> +++ b/kernel/rcu/rcu.h > >>>> @@ -572,6 +572,8 @@ void do_trace_rcu_torture_read(const char > >>>> *rcutorturename, > >>>> unsigned long c_old, > >>>> unsigned long c); > >>>> void rcu_gp_set_torture_wait(int duration); > >>>> +void rcu_set_torture_ovf_lag(unsigned long lag); > >>>> +int rcu_get_gpwrap_count(int cpu); > >>>> #else > >>>> static inline void rcutorture_get_gp_data(int *flags, unsigned long > >>>> *gp_seq) > >>>> { > >>>> @@ -589,6 +591,8 @@ void do_trace_rcu_torture_read(const char > >>>> *rcutorturename, > >>>> do { } while (0) > >>>> #endif > >>>> static inline void rcu_gp_set_torture_wait(int duration) { } > >>>> +static inline void rcu_set_torture_ovf_lag(unsigned long lag) { } > >>>> +static inline int rcu_get_gpwrap_count(int cpu) { return 0; } > >>> > >>> Very good, you did remember CONFIG_SMP=n. And yes, I did try it. ;-) > >>> > >>> But shouldn't these be function pointers in rcu_torture_ops? That way if > >>> some other flavor of RCU starts doing wrap protection for its grace-period > >>> sequence numbers, this testing can apply directly to that flavor as well. > >> > >> These are here because 'rdp' is not accessible AFAIK from rcutorture.c. > >> I could add wrappers to these and include them as pointers the a struct as > >> well. > >> But I think these will still stay to access rdp. > > > > Why not just pass in the CPU number and let the pointed-to function find > > the rdp? > > You mean provide a helper from tree.c that is called in rcutorture, then > helper function returning an rdp? > > If so, I can certainly do that, was just not sure if this sort of thing was > ok since nothing else in rcutorture accesses rdp directly :)
Actually, this is done quite a bit. For example, the "rcu" .gp_kthread_dbg() function pointer is set to show_rcu_gp_kthreads(), which is defined either in rcu.h (for Tiny) or in tree_stall.h (for Tree). Alternatively, sometimes a small wrapper function is defined in rcutorture.c which calls functions defined differently for different RCU flavors. So please feel free to create a function pointer and to define the function in tree.c. But learn from my many mistakes and make sure to provide a suitable definition for Tiny RCU! ;-) Thanx, Paul > Thanks, > > - Joel > > > > > >>> Then the pointers can simply be NULL in kernels built with CONFIG_SMP=n. > >>> > >>>> #endif > >>>> unsigned long long rcutorture_gather_gp_seqs(void); > >>>> void rcutorture_format_gp_seqs(unsigned long long seqs, char *cp, size_t > >>>> len); > >>>> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c > >>>> index 895a27545ae1..79a72e70913e 100644 > >>>> --- a/kernel/rcu/rcutorture.c > >>>> +++ b/kernel/rcu/rcutorture.c > >>>> @@ -118,6 +118,9 @@ torture_param(int, nreaders, -1, "Number of RCU > >>>> reader threads"); > >>>> torture_param(int, object_debug, 0, "Enable debug-object double > >>>> call_rcu() testing"); > >>>> torture_param(int, onoff_holdoff, 0, "Time after boot before CPU > >>>> hotplugs (s)"); > >>>> torture_param(int, onoff_interval, 0, "Time between CPU hotplugs > >>>> (jiffies), 0=disable"); > >>>> +torture_param(int, ovf_cycle_mins, 30, "Total cycle duration for ovf > >>>> lag testing (in minutes)"); > >>>> +torture_param(int, ovf_active_mins, 5, "Duration for which ovf lag is > >>>> active within each cycle (in minutes)"); > >>>> +torture_param(int, ovf_lag_gps, 8, "Value to set for > >>>> set_torture_ovf_lag during an active testing period."); > >>> > >>> Given that "ovf" means just "overflow", would it make sense to get a "gp" > >>> in there? Maybe just "gpwrap_..."? > >>> > >>> "What is in a name?" ;-) > >> > >> Sure, makes sense I will rename. > > > > Thank you! > > > >>> I could argue with the defaults, but I run long tests often enough that > >>> I am not worried about coverage. As long as we remember to either run > >>> long tests or specify appropriate rcutorture.ovf_cycle_mins when messing > >>> with ->gpwrap code. > >>> > >>>> torture_param(int, nocbs_nthreads, 0, "Number of NOCB toggle threads, 0 > >>>> to disable"); > >>>> torture_param(int, nocbs_toggle, 1000, "Time between toggling nocb state > >>>> (ms)"); > >>>> torture_param(int, preempt_duration, 0, "Preemption duration (ms), zero > >>>> to disable"); > >>>> @@ -2629,6 +2632,7 @@ rcu_torture_stats_print(void) > >>>> int i; > >>>> long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 }; > >>>> long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 }; > >>>> + long n_gpwraps = 0; > >>>> struct rcu_torture *rtcp; > >>>> static unsigned long rtcv_snap = ULONG_MAX; > >>>> static bool splatted; > >>>> @@ -2639,6 +2643,7 @@ rcu_torture_stats_print(void) > >>>> pipesummary[i] += READ_ONCE(per_cpu(rcu_torture_count, > >>>> cpu)[i]); > >>>> batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch, > >>>> cpu)[i]); > >>>> } > >>>> + n_gpwraps += rcu_get_gpwrap_count(cpu); > >>>> } > >>>> for (i = RCU_TORTURE_PIPE_LEN; i >= 0; i--) { > >>>> if (pipesummary[i] != 0) > >>>> @@ -2672,6 +2677,7 @@ rcu_torture_stats_print(void) > >>>> pr_cont("read-exits: %ld ", data_race(n_read_exits)); // Statistic. > >>>> pr_cont("nocb-toggles: %ld:%ld\n", > >>> > >>> The "\n" on the above line needs to be deleted. > >> > >> Ok. > >> > >>>> atomic_long_read(&n_nocb_offload), > >>>> atomic_long_read(&n_nocb_deoffload)); > >>>> + pr_cont("gpwraps: %ld\n", n_gpwraps); > >>>> > >>>> pr_alert("%s%s ", torture_type, TORTURE_FLAG); > >>>> if (atomic_read(&n_rcu_torture_mberror) || > >>>> @@ -3842,6 +3848,58 @@ static int rcu_torture_preempt(void *unused) > >>>> > >>>> static enum cpuhp_state rcutor_hp; > >>>> > >>>> +static struct hrtimer ovf_lag_timer; > >>>> +static bool ovf_lag_active; > >>> > >>> Same "ovf" naming complaint as before. > >> > >> Ok. > >> > >>>> +} > >>>> + > >>>> +static int rcu_torture_ovf_lag_init(void) > >>>> +{ > >>>> + if (ovf_cycle_mins <= 0 || ovf_active_mins <= 0) { > >>>> + pr_alert("rcu-torture: lag timing parameters must be > >>>> positive\n"); > >>>> + return -EINVAL; > >>>> + } > >>> > >>> Why not refuse to start this portion of the test when testing CONFIG_SMP=n > >>> or something other than vanilla RCU? No need to fail the test, just > >>> print something saying that this testing won't be happening. > >> > >> Got it, will do. > > > > Again, thank you! > > > >>>> +static void rcu_torture_ovf_lag_cleanup(void) > >>>> +{ > >>>> + hrtimer_cancel(&ovf_lag_timer); > >>>> + > >>>> + if (ovf_lag_active) { > >>>> + rcu_set_torture_ovf_lag(0); > >>>> + ovf_lag_active = false; > >>>> + } > >>>> +} > >>> > >>> Did you try the modprobe/rmmod testing to verify that this > >>> cleans up appropriately? You could use the drgn tool to check. > >>> See tools/rcu//rcu-cbs.py for an example drgn script that digs into the > >>> rcu_data structures. > >> > >> Nice, will check! > >> > >> Will work on this and provide v2. > > > > Looking forward to it! > > > > Thanx, Paul