We have been seeing an issue where if rtla is run on machines with a high number of CPUs (100+), timerlat can generate more samples than rtla is able to process via tracefs_iterate_raw_events. This is especially common when the interval is set to 100us (rteval and cyclictest default) as opposed to the rtla default of 1000us, but also happens with the rtla default.
Currently, this leads to rtla hanging and having to be terminated with SIGTERM. SIGINT setting stop_tracing is not enough, since more and more events are coming and tracefs_iterate_raw_events never exits. This patchset contains two changes: - Stop the timerlat tracer on SIGINT/SIGALRM to ensure no more events are generated when rtla is supposed exit. This fixes rtla hanging and should go to stable. - On receiving SIGINT/SIGALRM twice, abort iteration immediately with tracefs_iterate_stop, making rtla exit right away instead of waiting for all events to be processed. This is more of a usability feature: if the user is in a hurry, they can Ctrl-C twice (or once after the duration has expired) and exit immediately, discarding any events pending processing. Note: I am sending those together only because the second one depends on the first. Also this should be fixed in osnoise, too. In the future, two more patchsets will be sent: one to display how many events/samples were dropped (either left in tracefs buffer or by buffer overflow), one to improve sample processing performance to be on par with cyclictest (ideally) so that samples are not dropped in the cases mentioned in the beginning of the email. Tomas Glozar (5): rtla: Add trace_instance_stop rtla/timerlat_hist: Stop timerlat tracer on signal rtla/timerlat_top: Stop timerlat tracer on signal rtla/timerlat_hist: Abort event processing on second signal rtla/timerlat_top: Abort event processing on second signal tools/tracing/rtla/src/timerlat_hist.c | 19 ++++++++++++++++++- tools/tracing/rtla/src/timerlat_top.c | 20 +++++++++++++++++++- tools/tracing/rtla/src/trace.c | 8 ++++++++ tools/tracing/rtla/src/trace.h | 1 + 4 files changed, 46 insertions(+), 2 deletions(-) -- 2.47.1