Re: [lttng-dev] Trigger snapshots on a watchdog

Kienan Stewart via lttng-dev Fri, 13 Sep 2024 02:51:24 -0700

Hi Damien,

I've added a very summaryfeature request issue here[1], referring tothis discussion. If you would like to elaborate or add other details,that would be most excellent.



thanks,

kienan


[1]: https://bugs.lttng.org/issues/1416

On 2024-09-12 12:14, Damien Berget wrote:

Thanks for the quick response Kienan,

Your proposal is exactly how we were thinking the monitor applicationcould work, so we'll go with that for now.Reacting to absence of an event (watch dog) would really be a goodcomplement to the existing trigger types.It's a really useful feature for a flight recorder in embedded mediumreal-time applications, is the team open to feature requests?

Cheers
Damien

On Thu, Sep 12, 2024 at 12:57 AM Kienan Stewart<kstew...@efficios.com> wrote:


    Hi Damien,

    On 2024-09-11 18:38, Damien Berget via lttng-dev wrote:
    > Good day,
    > We are trying to see what it the best way to monitor some
    applications
    > not hitting a deadline. Ideally something like a watchdog that
    needs
    > to be pat regularly and if timeout is reached triggers the snapshot.
    >
    > Before we reinvent the wheel and code some userland
    applications, is
    > there a canonical way in LTTng to do it? I found this
    > <https://review.lttng.org/c/lttng-tools/+/9657/9> that is
    suspiciously
    > close maybe?
    >
    I don't think the the proposed changes you linked to are useful or
    related to what you hope to achieve. The patch series is a concept
    about
    how some types of UST ring buffer stalls might be addressed by the
    session daemon. After a quick glance, the monitoring seems to be more
    closely related to the 'monitor timer', which is used to sample
    statistical information channels[1].


    There is a concept of triggers[2]; however triggers react to the
    presence of events rather than the absence thereof.


    I think a small user space application that monitors the state of
    other
    applications is more the direction to head in. There's at least of
    couple of ways that a snapshot on unhealthy state could be achieved:


    * Use liblttng-ctl to trigger a snapshot from your watchdog
    application[3][4].

    * Have the watchdog application exec `lttng snapshot record`[5].

    * Have the watchdog application emit some sort of "health state"
    events
    with some data (e.g. health_okay, health_bad, ...) per your usage
    requirements, and configure a trigger[2] to take a snapshot on the
    "health state" events that have the non-okay state.


    Depending on your tracing configuration - channel overwrite/discard
    mode[6], buffer sizes, blocking mode, and number of events it is
    possible that events may not be recorded. I would privilege using
    liblttng-ctl or exec'ing `lttng snapshort record` if you want a
    stronger
    guarantee that your watchdog will cause a snapshot to be taken.


    I would love to hear if there are other ideas. Regardless, hope
    this helps!


    thanks,

    kienan


    [1]: https://lttng.org/docs/v2.13/#doc-channel-timers

    [2]: https://lttng.org/docs/v2.13/#doc-trigger

    [3]: https://lttng.org/docs/v2.13/#doc-liblttng-ctl-lttng

    [4]:
    https://github.com/lttng/lttng-tools/tree/master/src/lib/lttng-ctl

    [5]: https://lttng.org/man/1/lttng-snapshot/v2.13/

    [6]:
    https://lttng.org/docs/v2.13/#doc-channel-overwrite-mode-vs-discard-mode


    > Thanks,
    > Cheers
    >
    > --
    > *Damien Berget*
    > Embedded Platform Lead
    > damien.ber...@flyzipline.com
    >
    > _______________________________________________
    > lttng-dev mailing list
    > lttng-dev@lists.lttng.org
    > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev



--
*Damien Berget*

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Re: [lttng-dev] Trigger snapshots on a watchdog

Reply via email to