Hi Damien,

I want to expand on one of the options that could work for your case.

On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote:
Hi Damien,


On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote:
Good day,
we have been using LTTng successfully to capture snapshots on user defined tracepoints and it did provide invaluable to debug our issues. Thanks to all the contributors of this project!

We'd like to know if it would be possible to trigger on a kernel panic? I might be dubiously possible as you would still need to have the file-system working to write the results but I should ask.


For userspace tracing, I think the recommendation is usually to use a dax/pmem device and have the buffers for the session mapped there. After a panic, the contents of the buffers can be restored using lttng-crash[1].

Note that dax/pem isn't supported by the kernel space tracer at this time.

If I recall, there are other ways to things in the panic sequence (that aren't lttng specific), but I'm personally not as familiar with the details of that stage of linux.


It's possible to kexec-tools to load a new kernel post-panic[1]. If your system uses kexec, the contents of RAM aren't necessarily flushed, and if both the initial kernel and post-panic kernel started by kexec have the same configuration for an emulated PMEM device using the memmap paramenter [2,3] that region of memory can have a daxfs created in it post-clean boot.

Note: some systems may not flush the memory during a warm reboot, but this is dependent on the BIOS.

When your system boots you could do something like the following:

 * If it's a clean boot, create the daxfs
* If it's an "unclean" boot (e.g. the daxfs already exists, or a kernel parameter informs you that it started post-panic) then you can copy/move/use lttng-crash to persistent storage for analysis * Start tracing using a snapshot session and the userspace buffers on the daxfs.

In this type of situation the "snapshot" command is never invoked directly, but the recovery of the buffers to create a snapshot is possible.

[1]: https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html
[2]: https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html [3]: https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap

thanks,
kienan

Looking at available kernel syscall, the "reboot" one seems like a good candidate, however I was not able to capture a snapshot on it. I have tested the setup below with "--name=chdir" syscall and it works, "cd" to a directory will create a trace. But no dice with reboot.


The details of how this work will depend on your system. For example, my installations tend to use systemd as PID 1. The broad strokes seem to be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I believe then kicks off the reboot.service, the PID 1 is swapped to /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all processes, unmounts, syncs, calls the reboot system call [2,3].

As both the sigterm and the unmounts are done before the syscall, lttng-sessiond and the consumers will have already shutdown by the time it enters.

While this doesn't necessarily help your original question of panics, if you want to snapshot before shutdown or reboot and are using systemd, it's possible to leave a script or binary in a known directory so that it's invoked prior to the rest of the shutdown sequence[4].

[1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems
[2]: https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c [3]: https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77
[4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/

hope this helps,
kienan

Would you have any suggestions?
Thanks for your help,
Cheers
Damien

============================

# Prep output dir
mkdir /application/trace/
rm -rf /application/trace/*

# Create session
sudo lttng destroy snapshot-trace-session
sudo lttng create snapshot-trace-session --snapshot --output="/application/trace/"
sudo lttng enable-channel --kernel --num-subbuf=8 channelk
sudo lttng enable-channel --userspace --num-subbuf=8 channelu

# Configure session
sudo lttng enable-event --kernel --syscall --all --channel channelk
sudo lttng enable-event --kernel --tracepoint "sched*" --channel channelk
sudo lttng enable-event --userspace --all --channel channelu
sudo lttng add-context -u -t vtid -t procname
sudo lttng remove-trigger trig_reboot
sudo lttng add-trigger --name=trig_reboot \
         --condition=event-rule-matches --type=kernel:syscall:entry \
         --name=reboot\
         --action=snapshot-session snapshot-trace-session \
         --rate-policy=once-after:1

# start & list info
sudo lttng start
sudo lttng list snapshot-trace-session
sudo lttng list-triggers

#======== test it...
sudo reboot

#======= reconnect and Nothing :(
$ ls -alu /application/trace/
drwxr-xr-x    2 u  u       4096 May 15  2024 .
drwxr-xr-x   10 u  u       4096 May 15  2024 ..


_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to