Thanks Kienan for these quick suggestions, we'll investigate the pmem route (I was not aware of the lttng-cash utility, it's pretty nice) even if I'm not sure how fast it would burn through our SSD, it might still be worth trying. As for kexec-tool, it's not officially supported on our embedded modules unfortunately, so we might be SOL there. We may have to try to add our own trace-point in kernel to use as trigger. Cheers Damien
On Thu, May 16, 2024 at 8:12 AM Kienan Stewart <kstew...@efficios.com> wrote: > Hi Damien, > > I want to expand on one of the options that could work for your case. > > On 5/16/24 9:37 AM, Kienan Stewart via lttng-dev wrote: > > Hi Damien, > > > > > > On 5/15/24 6:24 PM, Damien Berget via lttng-dev wrote: > >> Good day, > >> we have been using LTTng successfully to capture snapshots on user > >> defined tracepoints and it did provide invaluable to debug our issues. > >> Thanks to all the contributors of this project! > >> > >> We'd like to know if it would be possible to trigger on a kernel > >> panic? I might be dubiously possible as you would still need to have > >> the file-system working to write the results but I should ask. > >> > > > > For userspace tracing, I think the recommendation is usually to use a > > dax/pmem device and have the buffers for the session mapped there. After > > a panic, the contents of the buffers can be restored using > lttng-crash[1]. > > > > Note that dax/pem isn't supported by the kernel space tracer at this > time. > > > > If I recall, there are other ways to things in the panic sequence (that > > aren't lttng specific), but I'm personally not as familiar with the > > details of that stage of linux. > > > > It's possible to kexec-tools to load a new kernel post-panic[1]. If your > system uses kexec, the contents of RAM aren't necessarily flushed, and > if both the initial kernel and post-panic kernel started by kexec have > the same configuration for an emulated PMEM device using the memmap > paramenter [2,3] that region of memory can have a daxfs created in it > post-clean boot. > > Note: some systems may not flush the memory during a warm reboot, but > this is dependent on the BIOS. > > When your system boots you could do something like the following: > > * If it's a clean boot, create the daxfs > * If it's an "unclean" boot (e.g. the daxfs already exists, or a > kernel parameter informs you that it started post-panic) then you can > copy/move/use lttng-crash to persistent storage for analysis > * Start tracing using a snapshot session and the userspace buffers on > the daxfs. > > In this type of situation the "snapshot" command is never invoked > directly, but the recovery of the buffers to create a snapshot is possible. > > [1]: https://www.kernel.org/doc/html/latest/admin-guide/kdump/kdump.html > [2]: > https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html > [3]: > > https://docs.pmem.io/persistent-memory/getting-started-guide/creating-development-environments/linux-environments/linux-memmap > > thanks, > kienan > > >> Looking at available kernel syscall, the "reboot" one seems like a > >> good candidate, however I was not able to capture a snapshot on it. I > >> have tested the setup below with "--name=chdir" syscall and it > >> works, "cd" to a directory will create a trace. But no dice with reboot. > >> > > > > The details of how this work will depend on your system. For example, my > > installations tend to use systemd as PID 1. The broad strokes seem to > > be: `/usr/sbin/reboot` is actually a link to `systemctl`, which I > > believe then kicks off the reboot.service, the PID 1 is swapped to > > /usr/lib/systemd/systemd-shutdown, sigterm then sigkill are sent to all > > processes, unmounts, syncs, calls the reboot system call [2,3]. > > > > As both the sigterm and the unmounts are done before the syscall, > > lttng-sessiond and the consumers will have already shutdown by the time > > it enters. > > > > While this doesn't necessarily help your original question of panics, if > > you want to snapshot before shutdown or reboot and are using systemd, > > it's possible to leave a script or binary in a known directory so that > > it's invoked prior to the rest of the shutdown sequence[4]. > > > > [1]: https://lttng.org/docs/v2.13/#doc-persistent-memory-file-systems > > [2]: > > > https://github.com/systemd/systemd/blob/6533c14997700f74e9ea42121303fc1f5c63e62b/src/shutdown/shutdown.c > > [3]: > > > https://github.com/systemd/systemd/blob/main/src/shared/reboot-util.c#L77 > > [4]: https://www.systutorials.com/docs/linux/man/8-systemd-reboot/ > > > > hope this helps, > > kienan > > > >> Would you have any suggestions? > >> Thanks for your help, > >> Cheers > >> Damien > >> > >> ============================ > >> > >> # Prep output dir > >> mkdir /application/trace/ > >> rm -rf /application/trace/* > >> > >> # Create session > >> sudo lttng destroy snapshot-trace-session > >> sudo lttng create snapshot-trace-session --snapshot > >> --output="/application/trace/" > >> sudo lttng enable-channel --kernel --num-subbuf=8 channelk > >> sudo lttng enable-channel --userspace --num-subbuf=8 channelu > >> > >> # Configure session > >> sudo lttng enable-event --kernel --syscall --all --channel channelk > >> sudo lttng enable-event --kernel --tracepoint "sched*" --channel > channelk > >> sudo lttng enable-event --userspace --all --channel channelu > >> sudo lttng add-context -u -t vtid -t procname > >> sudo lttng remove-trigger trig_reboot > >> sudo lttng add-trigger --name=trig_reboot \ > >> --condition=event-rule-matches --type=kernel:syscall:entry \ > >> --name=reboot\ > >> --action=snapshot-session snapshot-trace-session \ > >> --rate-policy=once-after:1 > >> > >> # start & list info > >> sudo lttng start > >> sudo lttng list snapshot-trace-session > >> sudo lttng list-triggers > >> > >> #======== test it... > >> sudo reboot > >> > >> #======= reconnect and Nothing :( > >> $ ls -alu /application/trace/ > >> drwxr-xr-x 2 u u 4096 May 15 2024 . > >> drwxr-xr-x 10 u u 4096 May 15 2024 .. > >> > >> > >> _______________________________________________ > >> lttng-dev mailing list > >> lttng-dev@lists.lttng.org > >> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > _______________________________________________ > > lttng-dev mailing list > > lttng-dev@lists.lttng.org > > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > -- *Damien Berget*
_______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev