> Well, actually it's been optimized a lot from the first instances. > Note, tracepoints need to be quite generic, so where it can be optimized > is not obvious.
There is quite a bit of stuff that doesn't change from trace point to trace point. So you could just cache all these decisions into a single per cpu variable, and invalidate if anything changes the conditions. > > > > Functions with # of instructions: > > > > 25 trace_event_raw_event_sched_switch > > This is from the TRACE_EVENT macro. > > if (trace_trigger_soft_disabled(trace_file)) \ > return; \ Cache? > if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) && > trace_event_ignore_this_pid(trace_file)) > return NULL; > > *current_rb = trace_file->tr->trace_buffer.buffer; Indirection could be cached. > > if ((trace_file->flags & > (EVENT_FILE_FL_SOFT_DISABLED | EVENT_FILE_FL_FILTERED)) && > (entry = this_cpu_read(trace_buffered_event))) { Multiple tests of trace_file->flags in different functions, could be all folded into a single test. > > > Note: Some of these if statements can be encapsulated with jump labels, as > they > are false pretty much all of the time. > > > /* Try to use the per cpu buffer first */ > val = this_cpu_inc_return(trace_buffered_event_cnt); > if (val == 1) { > trace_event_setup(entry, type, flags, pc); > entry->array[0] = len; > return entry; > } > this_cpu_dec(trace_buffered_event_cnt); This can be pre cached in a fast path. If true just point per cpu variable directly to buffer and invalidate if buffer changes or is full. > preempt_disable_notrace(); > > if (unlikely(atomic_read(&buffer->record_disabled))) > goto out; Cache. > > cpu = raw_smp_processor_id(); > > if (unlikely(!cpumask_test_cpu(cpu, buffer->cpumask))) > goto out; Not needed for per cpu buffers in a fast path. > > cpu_buffer = buffer->buffers[cpu]; > > if (unlikely(atomic_read(&cpu_buffer->record_disabled))) > goto out; cache > > if (unlikely(length > BUF_MAX_DATA_SIZE)) > goto out; > > if (unlikely(trace_recursive_lock(cpu_buffer))) > goto out; locking should invalidate state on this cpu, so shouldn't be needed > u64 diff; > > rb_start_commit(cpu_buffer); > > #ifdef CONFIG_RING_BUFFER_ALLOW_SWAP > /* > * Due to the ability to swap a cpu buffer from a buffer > * it is possible it was swapped before we committed. > * (committing stops a swap). We check for it here and > * if it happened, we have to fail the write. > */ > barrier(); > if (unlikely(ACCESS_ONCE(cpu_buffer->buffer) != buffer)) { > local_dec(&cpu_buffer->committing); > local_dec(&cpu_buffer->commits); > return NULL; > } swapping should invalidate state centrally (and then run slow path like current path) > /* > * We allow for interrupts to reenter here and do a trace. > * If one does, it will cause this original code to loop > * back here. Even with heavy interrupts happening, this > * should only happen a few times in a row. If this happens > * 1000 times in a row, there must be either an interrupt > * storm or we have something buggy. > * Bail! > */ > if (RB_WARN_ON(cpu_buffer, ++nr_loops > 1000)) > goto out_fail; didn't you disable interrupts earlier? > /* Did the write stamp get updated already? */ > if (likely(info.ts >= cpu_buffer->write_stamp)) { > info.delta = diff; > if (unlikely(test_time_stamp(info.delta))) > rb_handle_timestamp(cpu_buffer, &info); > } Not sure why a write stamp is needed. isn't that the last few entries? ... -Andi