Re: Data corruption on reboot

Maxim Cournoyer Thu, 08 May 2025 18:42:43 -0700

Hi Danny,

Danny Milosavljevic <dan...@friendly-machines.com> writes:


> Hi,
>
> For me, the problem is fixed.  I've tried regular shutdown, kill it
> by holding power button, kill it with magic sysrq key S U B--and all
> recovered to a clean filesystem (no ignored errors either) eventually.

Good to know!

[...]

> That said, there are some remaining things that are just dumb that we
> totally should fix:

> a. I didn't see any of the last messages on the screen on shutdown,
> not even when reopening and using /dev/console .
> That's because Linux has a notion of "the console" that can move to
> some tty.  If the console moved to a tty that's actually used
> by Wayland, you don't see messages on shutdown.

Interesting.

> For testing, I've always had to manually run the following program
> before shutting down:
>
> ~/src/choose-vt$ cat a.c 
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <sys/ioctl.h>
> #include <linux/vt.h>
>
> int main() {
>       int fd = open("/dev/tty2", O_RDWR);
>       if (fd == -1) {
>               perror("open");
>               exit(1);
>       }
>       ioctl(fd, VT_ACTIVATE, 2); // switches to that VT
>       ioctl(fd, TIOCCONS, 0); // moves "who is the console" to that tty.
>       return 0;
> }
>
> (I don't think that would be useful for kmscon, though)

Is this always the case, such as when testing with virtual machines?  If
so, could you report this bug so that it is not forgotten/can be tackled
individually?  Ideally with an easy reproducer (a VM would be fine).

> In my opinion something like this should be automated in a reasonable way.
>
> Possibilities:
> - On shutdown, activate a tty that's guaranteed to be only
> used for logging anyway.  That tty should not be kmscon.
> - Run Plymouth on shutdown (and on boot would be neat).
>   That has a graphical console meant for that.
>
> Text VT in Linux is deprecated and will eventually be removed.  Hmm.
>
> b. Currently, we try remounting 10 times in a loop.  If that doesn't
> work, we just reboot or power off anyway.  I agree that there needs
> to be forward progress.  But:
>
> For debugging, this is terrible, since there's no way to just halt
> WITHOUT turning off the screen.
>
> The following, strangely, did not work for me either:
>
>   sudo herd eval root '((@ (shepherd system) halt))'

Halt is defined as such in the Shepherd:

--8<---------------cut here---------------start------------->8---
(define (halt)
  "Halt the system.  Return #f on failure."
  (%libc-reboot RB_HALT_SYSTEM))
--8<---------------cut here---------------end--------------->8---

It seems it should work as expected?  Could you please file a bug with
the [shepherd] tag in the title for it, something like:

--8<---------------cut here---------------start------------->8---
[shepherd] halt command appears to shutdown, hampers debugging
--8<---------------cut here---------------end--------------->8---

> c. Since we have the store available at the very end anyway,
> guix could always invoke "fuser" at the very end (after the
> 10 unsuccessful attempts) so we get alerted about future
> problems.  But (see b.) that only helps if the user can
> see the messages :P
> The photos I got of the shutdown messages were done with an
> actual digital camera that has fast (50 ms) capture activation
> and multiple (complete) attempts.  That's... not ideal to require.

Were there processes preventing the shutdown completing correctly, as
seen in your troubleshooting, photos?  I don't think it should happen in
the first place that a process can stay alive and prevent umount from
succeeding; can processes actually survive a SIGKILL?  It seems it
shouldn't happen [0].  Shepherd also kill the process group of the
tracked process, so that would include any child process.

[0]  https://unix.stackexchange.com/a/5648/82353

-- 
Thanks,
Maxim

Re: Data corruption on reboot

Reply via email to