On 11/29/23 15:58, Steven Rostedt wrote: > On Wed, 29 Nov 2023 10:22:23 +0100 > Petr Pavlu <petr.pa...@suse.com> wrote: > >> Yes, I believe this should address this potential race condition. >> >> An alternative would be instead to update >> trace_event_buffer_lock_reserve() as follows: >> >> if (this_cpu_inc_return(trace_buffered_event_cnt) == 1) { >> smp_rmb(); > > This is the problem I have with your approach. That smp_rmb() is in the > highly critical path. On some architectures, this has a significant impact > on the overhead of this code. This path is called during code execution and > increases the overhead of the tracing infrastructure. > > If I'm given two solutions where one adds a smp_rmb() to the critical path > and the other just slows down the non-critical path more, I'll take the > slow down of non-critical path every time. > >> if ((entry = __this_cpu_read(trace_buffered_event))) { >> [...] >> >> That saves the synchronize_rcu() call but additionally modifies >> trace_buffered_event_cnt even if trace_buffered_event is currently NULL. >> >> Another alternative is the approach taken by my patch which avoids more >> RCU work and unnecessary memory modifications. >> >> I'd be interested if you could have a look again at what I'm proposing >> in my patch. I think it simplifies the code while addressing these >> problems as well. However, if you have reservations about that approach >> then it is ok, I can fix the found problems individually as discussed. > > Fix this without adding any memory barriers to the critical path, then I'll > take another look. > > FYI, this code was designed in the first place to avoid adding memory > barriers in the critical path.
Thank you for the explanation. I ran the tests you mentioned in the description of commit 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events") to understand this aspect a bit more. I confirm that my proposed patch makes the tracing slower, usually by single digit percentages. I understand this is not welcome. I also realize that the slowdown might be even worse in different situations and on other architectures that I checked (arm64, x86_64). Ok, keeping the current approach, my plan for v2 is to prepare the following patches: * Fix for the missing increment+decrement of trace_buffered_event_cnt on the current CPU in trace_buffered_event_disable(). Replace smp_call_function_many() with on_each_cpu_mask() in trace_buffered_event_disable(). The on_each_cpu_mask() function has also an advantage that it itself disables preemption so doing that can be then removed from trace_buffered_event_disable(). * Fix the potential race between trace_buffered_event_enable() and trace_event_buffer_lock_reserve() where the latter might already see a valid trace_buffered_event pointer but not all initialization yet. I think this might be actually best to address by using the same maintenance exclusion as is implemented in trace_buffered_event_disable(). It would make both maintenance operations consistent but for the cost of making the enable operation somewhat slower. * Fix the WARN_ON_ONCE(!trace_buffered_event_ref) issued in trace_buffered_event_disable() when trace_buffered_event_enable() previously fails. Add a variable/flag tracking whether trace_buffered_event is currently allocated and use that for driving if a new allocation needs to be done when trace_buffered_event_enable() is called, or the buffers should be really freed when trace_buffered_event_disable() is invoked. Not sure if the mentioned alternative of leaving trace_buffered_event partially initialized on failure is preferred instead. * Fix the potential race between trace_buffered_event_disable() and trace_event_buffer_lock_reserve() where the latter might still grab a pointer from trace_buffered_event that is being freed. Replace smp_wmb() with synchronize_rcu() in trace_buffered_event_disable(). Thanks, Petr