gpwrap

Paul E. McKenney Fri, 11 Apr 2025 12:18:57 -0700

On Fri, Apr 11, 2025 at 05:36:32AM -0000, Joel Fernandes wrote:
> Hello, Paul,
> 
> On Fri, 11 Apr 2025 05:33:16 GMT, "Paul E. McKenney" wrote:
> > On Thu, Apr 10, 2025 at 11:54:13AM -0700, Paul E. McKenney wrote:
> > > On Thu, Apr 10, 2025 at 11:29:03AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Apr 10, 2025 at 11:03:27AM -0400, Joel Fernandes wrote: >
> > > > Currently, the ->gpwrap is not tested (at all per my testing) due to
> > > > the > requirement of a large delta between a CPU's rdp->gp_seq and its
> > > > node's > rnp->gpseq.  > > This results in no testing of ->gpwrap being
> > > > set. This patch by default > adds 5 minutes of testing with ->gpwrap
> > > > forced by lowering the delta > between rdp->gp_seq and rnp->gp_seq to
> > > > just 8 GPs. All of this is > configurable, including the active time for
> > > > the setting and a full > testing cycle.  > > By default, the first 25
> > > > minutes of a test will have the _default_ > behavior there is right now
> > > > (ULONG_MAX / 4) delta. Then for 5 minutes, > we switch to a smaller delt
> > a
> > > > causing 1-2 wraps in 5 minutes. I believe > this is reasonable since we
> > > > at least add a little bit of testing for > usecases where ->gpwrap is se
> > t.
> > > > > > Signed-off-by: Joel Fernandes <[email protected]>
> > > > 
> > > > Much better, thank you!
> > > > 
> > > > One potential nit below.  I will run some tests on this version.
> > > 
> > > And please feel free to apply the following to both:
> > > 
> > > Tested-by: Paul E. McKenney <[email protected]>
> > 
> > And this happy situation lasted only until I rebased onto v6.15-rc1 and
> > on top of this commit:
> > 
> > 1342aec2e442 ("Merge branches 'rcu/misc-for-6.16', 'rcu/seq-counters-for-6.1
> > 6' and 'rcu/torture-for-6.16' into rcu/for-next")
> > 
> > This got me the splat shown below when running rcutorture scenario SRCU-N.
> > I reverted this commit and tests pass normally.
> > 
> > Your other commit (ARM64 images) continues working fine.
> 
> Interesting.. it seems to be crashing during statistics printing.
> 
> I am wondering if the test itself uncovered a bug or the bug is in the test
> itself.


Both are quite possible, also a bug somewhere else entirely.

> Looking forward to your test with the other patch and we could hold off on 
> this
> one till we have more data about what is going on.

This one got lot of OOMs when tests of RCU priority boosting overlapped
with testing of RCU callback flooding on TREE03, as in 13 of the 14
9-hour runs.  Back on v6.14-rc1, these were quite rare.

Ah, and I am carrying this as an experimental patch:

269b9b5be09d ("EXP sched: Disable DL server if sysctl_sched_rt_runtime is -1")

Just checking to see if this is still something I should be carrying.

                                                        Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> 
> 
> > 
> >                                                     Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > [   15.911885] BUG: kernel NULL pointer dereference, address: 00000000000000
> > 00
> > [   15.912413] #PF: supervisor instruction fetch in kernel mode
> > [   15.912826] #PF: error_code(0x0010) - not-present page
> > [   15.913218] PGD 0 P4D 0 
> > [   15.913420] Oops: Oops: 0010 [#1] SMP PTI
> > [   15.913715] CPU: 3 UID: 0 PID: 62 Comm: rcu_torture_sta Not tainted 6.15.
> > 0-rc1-00047-g6e14cad86633 #19 PREEMPT(undef) 
> > [   15.914535] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15
> > .0-1 04/01/2014
> > [   15.915147] RIP: 0010:0x0
> > [   15.915348] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
> > [   15.915856] RSP: 0000:ffffa0380021fdc8 EFLAGS: 00010246
> > [   15.916256] RAX: 0000000000000000 RBX: ffffffffb6b02cc0 RCX: 000000000000
> > 000a
> > [   15.916802] RDX: 0000000000000000 RSI: ffff9f121f418cc0 RDI: 000000000000
> > 0000
> > [   15.917305] RBP: 0000000000000000 R08: ffff9f121f418d20 R09: 000000000000
> > 0000
> > [   15.917789] R10: 0000000000000000 R11: 0000000000000005 R12: ffffffffb6b0
> > 2d20
> > [   15.918293] R13: 0000000000000000 R14: ffffa0380021fe50 R15: ffffa0380021
> > fdf8
> > [   15.918801] FS:  0000000000000000(0000) GS:ffff9f1268a96000(0000) knlGS:0
> > 000000000000000
> > [   15.919313] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   15.919628] CR2: ffffffffffffffd6 CR3: 0000000017c32000 CR4: 000000000000
> > 06f0
> > [   15.920004] Call Trace:
> > [   15.920139]  <TASK>
> > [   15.920256]  rcu_torture_stats_print+0x16b/0x670
> > [   15.920514]  ? __switch_to_asm+0x39/0x70
> > [   15.920719]  ? finish_task_switch.isra.0+0x76/0x250
> > [   15.920982]  ? __pfx_rcu_torture_stats+0x10/0x10
> > [   15.921222]  rcu_torture_stats+0x25/0x70
> > [   15.921435]  kthread+0xf1/0x1e0
> > [   15.921602]  ? __pfx_kthread+0x10/0x10
> > [   15.921797]  ? __pfx_kthread+0x10/0x10
> > [   15.922000]  ret_from_fork+0x2f/0x50
> > [   15.922193]  ? __pfx_kthread+0x10/0x10
> > [   15.922395]  ret_from_fork_asm+0x1a/0x30
> > [   15.922605]  </TASK>
> > [   15.922723] Modules linked in:
> > [   15.922890] CR2: 0000000000000000
> > [   15.923072] ---[ end trace 0000000000000000 ]---

Re: [v3,1/2] rcutorture: Perform more frequent testing of ->gpwrap

Reply via email to