On 2023-09-08 06:56, Danter, Richard via lttng-dev wrote:
Hi all,

I am investigating an issue that takes some time to reproduce. Finding
the right point in the logs is therefore very difficult.

Since I can detect when the issue happens in the kernel I would like to
be able to emit an event into the trace that I can then search for in
Trace Compass of through Babeltrace. So basically a kind of flag that
says "look here". That way I can jump right to the problem and then
look backwards from there to see what happened just before.

I have looked at the docs for how to add a trace point, but it seems
pretty complicated. I may have missed something though, so I wonder if
there is a trivial way to add such a flag to the log? Up to now I just
put a printk() in which helps, but would still be nicer to have
something directly in the log.

Hello Rich,

This is a good question! The easiest way to point directly to the relevant part of a trace is to stop capturing trace data immediately after the identified issue is encountered. This means you know what you're looking for is right at the end of the trace. Stopping the trace seems like a good fit in this scenario because you're only interested in what happens immediately before the issue and you're able to identify when the problem has happened.

Assuming you would like to avoid modifying the kernel code, LTTng triggers [1] may be a good fit. Triggers allow you to associate a condition (e.g. event X happened) with an action you would like to take (e.g. stop tracing). When the condition is encountered, the associated action is automatically triggered.

In this scenario we would recommend:

1. Trace in overwrite mode (flight recorder mode): Since the issue takes a while to reproduce and only the events immediately preceding the issue are relevant, keeping just a limited amount of the most recent data avoids accumulating useless data volume.

2. Determine when the issue is encountered with a trigger: This will focus the trace on the problem area.

3. When the issue is encountered, take a snapshot: This will give you a trace that contains what is relevant. What happened immediately before the trigger will be at the end of the trace.

In terms of defining the trigger condition, you can add a trigger [2] that matches a kernel event type that happens as close as possible to right after the issue is encountered and then specify additional details for the condition using the capture descriptor [3]. Ideally, you want a condition that will only be true when the issue is encountered to avoid having to manually sort through the snapshots afterwards. The add trigger man page provides several examples [4] that illustrate the condition and action syntax.

Hope this helps!

Best,
Erica

[1] LTTng triggers - https://lttng.org/docs/v2.13/#doc-trigger
[2] Add trigger - https://lttng.org/man/1/lttng-add-trigger/v2.13/
[3] Trigger capture descriptor - https://lttng.org/man/1/lttng-add-trigger/v2.13/#doc-capture-descr [4] Trigger examples - https://lttng.org/man/1/lttng-add-trigger/v2.13/#doc-examples


If there isn't such a thing already, then would it be a reasonable
enhancement request to be able to add such a feature?

Thanks
Rich


_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to