Hi Damien,
On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
Good day,
We are trying to see what it the best way to monitor some applications
not hitting a deadline. Ideally something like a watchdog that needs
to be pat regularly and if timeout is reached triggers the snapshot.
Before we reinvent the wheel and code some userland applications, is
there a canonical way in LTTng to do it? I found this
<https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously
close maybe?
I don't think the the proposed changes you linked to are useful or
related to what you hope to achieve. The patch series is a concept about
how some types of UST ring buffer stalls might be addressed by the
session daemon. After a quick glance, the monitoring seems to be more
closely related to the 'monitor timer', which is used to sample
statistical information channels[1].
There is a concept of triggers[2]; however triggers react to the
presence of events rather than the absence thereof.
I think a small user space application that monitors the state of other
applications is more the direction to head in. There's at least of
couple of ways that a snapshot on unhealthy state could be achieved:
* Use liblttng-ctl to trigger a snapshot from your watchdog
application[3][4].
* Have the watchdog application exec `lttng snapshot record`[5].
* Have the watchdog application emit some sort of "health state" events
with some data (e.g. health_okay, health_bad, ...) per your usage
requirements, and configure a trigger[2] to take a snapshot on the
"health state" events that have the non-okay state.
Depending on your tracing configuration - channel overwrite/discard
mode[6], buffer sizes, blocking mode, and number of events it is
possible that events may not be recorded. I would privilege using
liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger
guarantee that your watchdog will cause a snapshot to be taken.
I would love to hear if there are other ideas. Regardless, hope this helps!
thanks,
kienan
[1]: https://lttng.org/docs/v2.13/#doc-channel-timers
[2]: https://lttng.org/docs/v2.13/#doc-trigger
[3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng
[4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl
[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/
[6]:
https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode
Thanks,
Cheers
--
*Damien Berget*
Embedded Platform Lead
damien.ber...@flyzipline.com
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev