Hi Damien,

If kexec is not an option on your system, you might be able to
access the pmem+dax filesystem after a warm reboot, but it very
much depends on whether your bios clears your memory or not on
warm reboot.

Cheers,

Mathieu

On 2024-05-16 14:22, Damien Berget via lttng-dev wrote:
Thanks Kienan for these quick suggestions,
we'll investigate the pmem route (I was not aware of the lttng-cash utility, it's pretty nice) even if I'm not sure how fast it would burn through our SSD, it might still be worth trying. As for kexec-tool, it's not officially supported on our embedded modules unfortunately, so we might be SOL there. We may have to try to add our own trace-point in kernel to use as trigger.
Cheers
Damien

On Thu, May 16, 2024 at 8:12 AM Kienan Stewart <kstew...@efficios.com <mailto:kstew...@efficios.com>> wrote:

    Hi Damien,

    I want to expand on one of the options that could work for your case.

    On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
     > Hi Damien,
     >
     >
     > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
     >> Good day,
     >> we have been using LTTng successfully to capture snapshots on user
     >> defined tracepoints and it did provide invaluable to debug our
    issues.
     >> Thanks to all the contributors of this project!
     >>
     >> We'd like to know if it would be possible to trigger on a kernel
     >> panic? I might be dubiously possible as you would still need to
    have
     >> the file-system working to write the results but I should ask.
     >>
     >
     > For userspace tracing, I think the recommendation is usually to
    use a
     > dax/pmem device and have the buffers for the session mapped
    there. After
     > a panic, the contents of the buffers can be restored using
    lttng-crash[1].
     >
     > Note that dax/pem isn't supported by the kernel space tracer at
    this time.
     >
     > If I recall, there are other ways to things in the panic sequence
    (that
     > aren't lttng specific), but I'm personally not as familiar with the
     > details of that stage of linux.
     >

    It's possible to kexec-tools to load a new kernel post-panic[1]. If
    your
    system uses kexec, the contents of RAM aren't necessarily flushed, and
    if both the initial kernel and post-panic kernel started by kexec have
    the same configuration for an emulated PMEM device using the memmap
    paramenter [2,3] that region of memory can have a daxfs created in it
    post-clean boot.

    Note: some systems may not flush the memory during a warm reboot, but
    this is dependent on the BIOS.

    When your system boots you could do something like the following:

       * If it's a clean boot, create the daxfs
       * If it's an "unclean" boot (e.g. the daxfs already exists, or a
    kernel parameter informs you that it started post-panic) then you can
    copy/move/use lttng-crash to persistent storage for analysis
       * Start tracing using a snapshot session and the userspace
    buffers on
    the daxfs.

    In this type of situation the "snapshot" command is never invoked
    directly, but the recovery of the buffers to create a snapshot is
    possible.

    [1]:
    https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
    <https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html>
    [2]:
    https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html 
<https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html>
    [3]:
    
https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap
 
<https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap>

    thanks,
    kienan

     >> Looking at available kernel syscall, the "reboot" one seems like a
     >> good candidate, however I was not able to capture a snapshot on
    it. I
     >> have tested the setup below with "--name=chdir" syscall and it
     >> works, "cd" to a directory will create a trace. But no dice with
    reboot.
     >>
     >
     > The details of how this work will depend on your system. For
    example, my
     > installations tend to use systemd as PID 1. The broad strokes
    seem to
     > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I
     > believe then kicks off the reboot.service, the PID 1 is swapped to
     > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent
    to all
     > processes, unmounts, syncs, calls the reboot system call [2,3].
     >
     > As both the sigterm and the unmounts are done before the syscall,
     > lttng-sessiond and the consumers will have already shutdown by
    the time
     > it enters.
     >
     > While this doesn't necessarily help your original question of
    panics, if
     > you want to snapshot before shutdown or reboot and are using
    systemd,
     > it's possible to leave a script or binary in a known directory so
    that
     > it's invoked prior to the rest of the shutdown sequence[4].
     >
     > [1]:
    https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
    <https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems>
     > [2]:
     >
    
https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c
 
<https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c>
     > [3]:
     >
    https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77 
<https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77>
     > [4]:
    https://www.systutorials.com/docs/linux/man/8-systemd-reboot/
    <https://www.systutorials.com/docs/linux/man/8-systemd-reboot/>
     >
     > hope this helps,
     > kienan
     >
     >> Would you have any suggestions?
     >> Thanks for your help,
     >> Cheers
     >> Damien
     >>
     >> ============================
     >>
     >> # Prep output dir
     >> mkdir /application/trace/
     >> rm -rf /application/trace/*
     >>
     >> # Create session
     >> sudo lttng destroy snapshot-trace-session
     >> sudo lttng create snapshot-trace-session --snapshot
     >> --output="/application/trace/"
     >> sudo lttng enable-channel --kernel --num-subbuf=8 channelk
     >> sudo lttng enable-channel --userspace --num-subbuf=8 channelu
     >>
     >> # Configure session
     >> sudo lttng enable-event --kernel --syscall --all --channel channelk
     >> sudo lttng enable-event --kernel --tracepoint "sched*" --channel
    channelk
     >> sudo lttng enable-event --userspace --all --channel channelu
     >> sudo lttng add-context -u -t vtid -t procname
     >> sudo lttng remove-trigger trig_reboot
     >> sudo lttng add-trigger --name=trig_reboot \
     >>          --condition=event-rule-matches
    --type=kernel:syscall:entry \
     >>          --name=reboot\
     >>          --action=snapshot-session snapshot-trace-session \
     >>          --rate-policy=once-after:1
     >>
     >> # start & list info
     >> sudo lttng start
     >> sudo lttng list snapshot-trace-session
     >> sudo lttng list-triggers
     >>
     >> #======== test it...
     >> sudo reboot
     >>
     >> #======= reconnect and Nothing :(
     >> $ ls -alu /application/trace/
     >> drwxr-xr-x    2 u  u       4096 May 15  2024 .
     >> drwxr-xr-x   10 u  u       4096 May 15  2024 ..
     >>
     >>
     >> _______________________________________________
     >> lttng-dev mailing list
     >> lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org>
     >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
    <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>
     > _______________________________________________
     > lttng-dev mailing list
     > lttng-dev@lists.lttng.org <mailto:lttng-dev@lists.lttng.org>
     > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
    <https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev>



--
*Damien Berget*

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to