On Thu, 8 Feb 2024 11:25:50 +0100 Mete Durlu <me...@linux.ibm.com> wrote:
> I have been only able to reliably reproduce this issue when the system > is under load from stressors. But I am not sure if it can be considered > as *really stressed*. > > system : 8 cpus (4 physical cores) > load : stress-ng --fanotify 1 (or --context 2) > result : ~5/10 test fails > > of course as load increases test starts to fail more often, but a > single stressor doesn't seem like much to me for a 4 core machine. > > after adding synchronize_rcu() + patch from Sven, I am no longer seeing > failures with the setup above. So it seems like synchronize_rcu() did > the trick(or at least it helps a lot) for the case described on the > previous mail. I couldn't trigger the failure yet, not even with > increased load(but now the test case takes > 5mins to finish :) ). Right, it will definitely force the race window to go away. Can you still trigger this issue with just Sven's patch and not this change? -- Steve > > Here is the diff: > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > @@ -9328,10 +9328,12 @@ rb_simple_write(struct file *filp, const char > __user *ubuf, > val = 0; /* do nothing */ > } else if (val) { > tracer_tracing_on(tr); > + synchronize_rcu(); > if (tr->current_trace->start) > tr->current_trace->start(tr); > } else { > tracer_tracing_off(tr); > + synchronize_rcu(); > if (tr->current_trace->stop) > tr->current_trace->stop(tr); > > Not 100% sure if these were the correct places to add them.