----- On Mar 14, 2018, at 12:37 PM, THEUNISSEN Rolf rolf.theunis...@altran.com wrote:
>> -----Original Message----- >> From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com] >> Sent: Tuesday, March 13, 2018 4:35 PM >> To: THEUNISSEN Rolf <rolf.theunis...@altran.com> >> Cc: lttng-dev@lists.lttng.org >> Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a >> process >> exits >> >> Hi Rolf, >> >> On Tue, Mar 13, 2018 at 10:48:17AM +0000, THEUNISSEN Rolf wrote: >> > Hi, >> > >> > I am currently tracing many processes with LTTng-UST on a system under >> heavy load, with 2 CPUs. >> > >> > The traces seem to be missing ust-events near the end of a trace. After >> doing some analysis, it seems that events are missing from the other CPU >> then the process exits on. More concretely: >> > >> > During the lifetime of a process: >> > - Process X executes on CPU 0,1: all events are in the trace Just >> > before the process is about to exit: >> > - Process X executes on CPU 0: missing events from the trace >> >> How are you validating this? > > I am also using kernel tracing, to trace the scheduler. In TraceCompass I can > view the states of the processes and CPUs. > >> >> > A few mili-seconds later: >> > - Process X executes on CPU1 and exits: The last events are in the >> > trace >> >> This is not expected. Do you have a reproducer so we can check what might >> be happening? > > I am not allowed to provide details about my current traces. And it is hard to > make a small example, but I got some more clues what is happening, see below. > >> >> We will need the version for lttng-ust, lttng-tools, babeltrace. > > I am using version 2.10, for analyzing the traces I am using TraceCompass > Kernel version: 3.10.62 > >> >> Note that you can run your application with lttng-ust in debug mode using >> LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr. >> >> e.g: >> $ LTTNG_UST_DEBUG=y ./my_sample_app >> > > Enableling this debugging flag gave me a clue what is happening. For all > processes, the registration is going OK. But for some processes, the > un-registration is not logged. On closer analysis (now with tracepoints from > SystemTap), I see that the processes for which the events are missing and the > un-registration is missing are killed by SIGKILL signal. This probably > explains > why some data can be lost. But maybe strange that only messages get lost that > were traced on a different CPU. > > I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent > to > the process. Somehow the kernel decides to deliver an SIGKILL instead of a > SIGTERM. When using lttng-ust with per-uid buffers, there is a known design limitation regarding process terminated by non-handled kill signals: if this occurs while the buffer is being written to (between reserve and commit), it will cause the rest of the per-cpu buffer to be unreadable. This applies to all events recorded by all applications of the same user id for that particular CPU. Recovering from this involves destroying and re-creating the tracing session. If this kind of scenario is likely for you, you might want to consider using per-pid buffers (lttng enable-channel -u --buffers-pid), which do not suffer from this design limitation. The downside of per-pid buffers is that it allocates more memory on your system for buffering, and adds extra overhead when used with processes that have a short life-time. Thanks, Mathieu > > Rolf > >> Cheers >> >> -- >> Jonathan Rajotte-Julien >> EfficiOS > _______________________________________________ > lttng-dev mailing list > lttng-dev@lists.lttng.org > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev