Hi Danny,

Danny Milosavljevic <dan...@friendly-machines.com> writes:

> Hi,
>
> Guix system has been regularily not properly unmounting my filesystems
> on shutdown or reboot.
>
> Usually, this manifested as a fsck because of sometimes dirty,
> sometimes harmlessly damaged, root filesystem, on the next boot.
>
> But this time, it broke the following (after a "sudo guix system
> reconfigure" and then a "sudo reboot"):
> 1. /boot/grub/x86_64-efi/normal.mod missing, making booting impossible,
> and also making getting to the previous generation impossible
> 2. /etc/services empty file, complicating boot loader recovery via
> recovery usb flash stick
> 3. /etc/protocols empty file, complicating boot loader recovery via
> recovery usb flash stick
> 4. Other things I probably didn't see

Hello, as for the etc files, those shouldn't be important by themselves,
because on boot they are copied from the booted generation, so when
that happens, you can just switch the generation to an older one, and
you will get proper files. The issue stems from corruption in the store.

As for grub, yeah, there isn't much you can do other than booting
manually or to a live iso.

>
> THERE HAD BEEN NO CRASH[1]. This was regular ("successful")
> sudo guix system reconfigure, sudo reboot.

This is a known issue, the cause is unknown, only some people are
experiencing it.
https://issues.guix.gnu.org/77086

>
> We should also fdatasync the files we create in /etc--if we aren't doing
> that already (apparently we aren't :P).   Of course, not making sure that
> grub is installed completely is much worse.

I don't think this makes sense as the files are from the store, if
anything, the files in the store should be fdatasync'd, the etc files
will be recreated on boot anyway.

>
> Right now, after finally recovering from this mess, about 10 GiB of my RAM
> are used for I/O buffers again.  Not syncing those to disk would be a
> very bad idea, especially for /boot/efi, which is FAT--which is not
> known for its resilience.
>
> So, where is the call to the "sync" syscall (or libc function) in the
> root shepherd when stopping the system shepherd?  I can't find it.

In the stop procedure of root-file-system shepherd service, located in
(gnu services base)

>
> P.S. Can we have a script that activates the guix system from a recovery
> stick? I took photos on what I had to do and am transcribing them here:

Probably not, everyone's setup is different, so it's impossible to
properly mount everything. There could be some steps in a script, but
not everything.

>
> I had gotten this error from grub:
> error: file `/boot/grub/x86_64-efi/normal.mod' not found.
> grub rescue>
>
> The grub rescue shell had no "cat" or "cp", so recovery from there was
> impossible.  "insmod normal" did not work there either.

Insmod normal is what loads the normal.mod, so it makes sense that
didn't work at that point.
You would have to boot manually - repeat what is in the grub.cfg.
>From what you said here it seems it would've been possible to boot to an
older generation without recovering from a live iso, but it's much
harder to do than live iso.

>
> 1. Boot a rescue Linux from USB stick
> 2. Make internet connection work
> 3. cryptsetup luksOpen /dev/nvme0n1p3 root
> 4. mkdir /mnt/a
> 5. mount /dev/mapper/root /mnt/a
> 6. unshare -m chroot /mnt/a /run/current-system/profile/bin/bash # note:
> that's /var/guix/profiles/system-193-link
> 7. export PATH=/run/current-system/profile/bin
> 8. Check out what Guix did to the filesystem and be horrified
> 9. mount -t proc proc /proc
> 10. mount -t sysfs none /sys
> 11. mount -t devtmpfs none /dev
> 12. mount -t devpts none /dev/pts # important, otherwise no terminals
> 13. mount -t efivarfs efivarfs /sys/firmware/efi/efivars
> 14. mount /dev/nvme0n1p1 /boot/efi
> 15. guix-daemon --build-users-group=guixbuild &
> 16. guix pull # will fail since I am root, but usually I would do that
> from my regular user via sudo, which I'm not allowed to from the rescue
> OS.  Hence it won't find my channels config.
> 17. guix pull --channels=/home/dannym/.config/guix/channels.scm
> 18. guix system reconfigure /etc/config.scm
> 19. sync # this time ;)
The sync here is not so important, I would rather unmount it,
that's more effective as you're making sure nothing can
write there anymore.

If you wanted to unmount, don't forget to first kill the guix-daemon.
> 20. exit
> 21. reboot
>
> I did try
>
> 6b. unshare -m chroot /mnt/a /run/current-system/activate
Since this system was corrupted, I would advise against activating that
one, you would want to activate an older one. It's easiest to find the
respective store paths in /var/guix/profiles/system-X-link.
>
> instead of steps 6...15--but I got an error about how it can't make the
> system "#f" the current system and it would fail.
>
> Note that the "unshare -m" is important, otherwise I get an error about
> "pivot_root" on guix pull (!).
>
> And, tomorrow, I'll restore from backup--until then, I have logs.
Are you using tlp? If you could share anything, I think it would be good
to see your /var/log/messages for when you rebooted.
>
> [1] I don't want this mail to seem overly negative, so I want to make
> clear that guix system otherwise works great on this laptop: even hidpi,
> color correctness, audio, video, bluetooth audio, bluetooth mouse, wifi,
> ethernet, standby and resume, backlight control, safe screen locking,
> external displays (HDMI port), game controllers (wiimote), USB (including
> displayport via USB), battery charge limiter and battery charge reporting,
> AMD ROCm, NVMe drive with about 2 GB/s.
> I reboot it once or twice a month (kernel updates, mostly).
> Otherwise I only standby it at the end of the day.

Reply via email to