Hi Damien,

On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
Good day,
We are trying to see what it the best way to monitor some applications not hitting a deadline. Ideally something like a watchdog that needs to be pat regularly and if timeout is reached triggers the snapshot.

Before we reinvent the wheel and code some userland applications, is there a canonical way in LTTng to do it? I found this <https://review.lttng.org/c/lttng-tools/+/9657/9> that is suspiciously close maybe?

I don't think the the proposed changes you linked to are useful or related to what you hope to achieve. The patch series is a concept about how some types of UST ring buffer stalls might be addressed by the session daemon. After a quick glance, the monitoring seems to be more closely related to the 'monitor timer', which is used to sample statistical information channels[1].


There is a concept of triggers[2]; however triggers react to the presence of events rather than the absence thereof.


I think a small user space application that monitors the state of other applications is more the direction to head in. There's at least of couple of ways that a snapshot on unhealthy state could be achieved:


* Use liblttng-ctl to trigger a snapshot from your watchdog application[3][4].

* Have the watchdog application exec `lttng snapshot record`[5].

* Have the watchdog application emit some sort of "health state" events with some data (e.g. health_okay, health_bad, ...) per your usage requirements, and configure a trigger[2] to take a snapshot on the "health state" events that have the non-okay state.


Depending on your tracing configuration - channel overwrite/discard mode[6], buffer sizes, blocking mode, and number of events it is possible that events may not be recorded. I would privilege using liblttng-ctl or exec'ing `lttng snapshort record` if you want a stronger guarantee that your watchdog will cause a snapshot to be taken.


I would love to hear if there are other ideas. Regardless, hope this helps!


thanks,

kienan


[1]: https://lttng.org/docs/v2.13/#doc-channel-timers

[2]:  https://lttng.org/docs/v2.13/#doc-trigger

[3]:  https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng

[4]: https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl

[5]: https://lttng.org/man/1/lttng-snapshot/v2.13/

[6]: https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode


Thanks,
Cheers

--
*Damien Berget*
Embedded Platform Lead
damien.ber...@flyzipline.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to