Re: Data corruption on reboot

2025-05-08 Thread Maxim Cournoyer
Hi Danny, Danny Milosavljevic writes: > Hi, > > For me, the problem is fixed. I've tried regular shutdown, kill it > by holding power button, kill it with magic sysrq key S U B--and all > recovered to a clean filesystem (no ignored errors either) eventually. Good to know! [...] > That said,

Re: Data corruption on reboot

2025-05-08 Thread Danny Milosavljevic
Hi, For me, the problem is fixed. I've tried regular shutdown, kill it by holding power button, kill it with magic sysrq key S U B--and all recovered to a clean filesystem (no ignored errors either) eventually. I can't speak for the other people who had filesystem corruption recently. I think w

Re: Data corruption on reboot

2025-05-08 Thread Ludovic Courtès
Hi Danny and all, Maxim Cournoyer writes: > Danny Milosavljevic writes: > > [...] > >> Anyway, after the updated e2fsck the machine has no problems anymore >> (so far...). >> >> Docs: >> >> >> Kernel bugfix: >>

Re: Data corruption on reboot

2025-05-06 Thread Maxim Cournoyer
Hi Danny, Danny Milosavljevic writes: [...] > Anyway, after the updated e2fsck the machine has no problems anymore > (so far...). > > Docs: > > > Kernel bugfix: >

Re: Data corruption on reboot

2025-05-05 Thread Danny Milosavljevic
Hi Ludo, Sorry for the delay. I'm currently very busy. Small summary follows: >That’s on the bare metal, right? It does look like the file system was >indeed in a bad state and that we’re just confirming it? Yes, but it would have stayed in the bad state forever because our e2fsck was too old

Re: Data corruption on reboot

2025-05-02 Thread Ludovic Courtès
Hi Danny, Danny Milosavljevic writes: > Found something new: [...] > Output: > hello world after sync! > hzllo world ports gone! > # > hello world end! > [177.726300] EXT4-fs error (device dm-0): > ext4_mark_recovery_complete:6276: comm shepherd: Orphan file not empty on > read-only fs. That

Re: Data corruption on reboot

2025-05-01 Thread Ryan Schanzenbacher
i often see a few lines (0-10) of orphaned inodes and stuff like that. nothing dramatic, and certainly nothing that demands user feedback. but i think the `recovering journal` part is an unambigous indication that the fs was not cleanly unmounted. To throw my hat into the ring with this,, I c

Re: Data corruption on reboot

2025-04-30 Thread Danny Milosavljevic
For the record, I have (repair #t) on my root fs in config.scm, and even that doesn't help. I think the solution is: We need to update e2fsprogs to 1.47.2. It has multiple fixes for orphans, including for filesystem corruption (!). I've thus updated e2fsprogs to 1.47.2 on guix master. Bef

Re: Data corruption on reboot

2025-04-26 Thread Danny Milosavljevic
Hi Ludo, With the following patch diff --git c/gnu/services/base.scm w/gnu/services/base.scm index 8c6563c99d..c1c348116b 100644 --- c/gnu/services/base.scm +++ w/gnu/services/base.scm @@ -351,11 +351,35 @@ (define %root-file-system-shepherd-service ;; Return #f if successfully stop

Re: Data corruption on reboot

2025-04-26 Thread Danny Milosavljevic
Also, aha? So the store IS still available? In that case we can just use fuser -ikvm / for debugging, and fuser -km / for production. Something like, (system* #$(file-append (@ (gnu packages linux) psmisc) "/bin/fuser")

Re: Data corruption on reboot

2025-04-26 Thread Danny Milosavljevic
Hi Ludo, I tried a variant of your change and nothing else, with a new guix checkout, like the following: $ ./pre-inst-env guix system vm --no-graphic gnu/system/examples/bare-bones.tmpl I get (after running the output's run-vm.sh shell script and then logging in as "root" and then issuing "hal

Re: Data corruption on reboot

2025-04-26 Thread Ludovic Courtès
Hi, Danny Milosavljevic writes: > Since I think I know what causes it that should be easy enough. I added a simple ‘lsof’ invocation and displayed the list of remaining processes: diff --git a/gnu/services/base.scm b/gnu/services/base.scm index 8c6563c99d..8b383e3d81 100644 --- a/gnu/services/

Re: Data corruption on reboot

2025-04-26 Thread Rutherther
Hi Danny, Danny Milosavljevic writes: > Hi Ludo, > > Thank you! > >>/var/run/shepherd/repl is a Unix-domain socket you can connect to (with >>socat or whatever) to get a REPL. > > Thanks! I managed to connect via emacs geiser-connect-local. > >>I think the very first step, before we can put t

Re: Data corruption on reboot

2025-04-25 Thread Danny Milosavljevic
Hi Ludo, Thank you! >/var/run/shepherd/repl is a Unix-domain socket you can connect to (with >socat or whatever) to get a REPL. Thanks! I managed to connect via emacs geiser-connect-local. >I think the very first step, before we can put these debugging helpers >to good use, is to reproduce the

Re: Data corruption on reboot

2025-04-25 Thread Ludovic Courtès
Hi Danny, Danny Milosavljevic writes: > I've now implemented the relevant parts of fuser in guile, see > . I've tested it as a standalone > module and it seems to work fine. However, I need help to integrate > something like that into guix (and/or shepherd, h

Re: Data corruption on reboot

2025-04-25 Thread Danny Milosavljevic
Hi Ludo, About my patch here: The store is not present at the time where fuser would be invoked, so my patch here won't work. This is a problem for most of the approaches, including the load-initramfs-again approach (initrd image is in the store). Frankly, after now understanding the problem, I'

Re: Data corruption on reboot

2025-04-24 Thread Ludovic Courtès
Danny Milosavljevic writes: > The more I think about it the more it seems we should re-enter our > initramfs on shutdown. Unfortunately, the Linux kernel frees it when > switching to the real root fs. Not sure how to get Linux to find it again > (or how to get Linux to not free it in the first

Re: Data corruption on reboot

2025-04-24 Thread Ludovic Courtès
Hey, Danny Milosavljevic writes: > If you want to debug that live, you can log to the console: switch to the > console (Ctrl-Alt-F1), set the kernel console loglevel to maximum (alt > print 9), and then watch what it says there. > > I currently use the following shell script to halt without powe

Re: Data corruption on reboot

2025-04-24 Thread Ludovic Courtès
Hello, "Ryan Schanzenbacher" writes: > To throw my hat into the ring with this,, I can confirm I've been seeing the > same for a while now (past 3-ish months?) I have a laptop that doesn't exhibit > this behavior that is a bit out of date, and one that does. Something I can > look into doing whe

Re: Data corruption on reboot

2025-04-24 Thread Attila Lendvai
> That is completely expected. The fsck check is ran on unmounted disks... > mounting first and repairing later doesn't make sense and is unsupported > by fsck... So there is no /var/log/messages to log to. i was just expecting that there's some in-memory ring buffer that is flushed when root be

Re: Data corruption on reboot

2025-04-23 Thread Rutherther
Hi Attila, Attila Lendvai writes: > if at boot i see `root: recovering journal` printed, does that mean that i'm > also affected? Sort of. If you always see just recovering journal, there is probably nothing to worry about, because the sync seems to be taking place. > > it was after a plain

Re: Data corruption on reboot

2025-04-23 Thread Danny Milosavljevic
The more I think about it the more it seems we should re-enter our initramfs on shutdown. Unfortunately, the Linux kernel frees it when switching to the real root fs. Not sure how to get Linux to find it again (or how to get Linux to not free it in the first place) and to pivot_root to it again (

Re: Data corruption on reboot

2025-04-23 Thread Danny Milosavljevic
Hi Ludo, >It’s the ‘net-base’ package—see ‘operating-system-etc-service’. Thanks! >> Thanks. I found it--and it syncs *before* unmounting. Why? >In case umount(2) fails. That makes sense. I've read through the relevant parts of the Linux kernel and it seems like this: the buffer cache (which

Re: Data corruption on reboot

2025-04-22 Thread Danny Milosavljevic
Hi, Attila Lendvai writes: > if at boot i see `root: recovering journal` printed, does that mean > that i'm also affected? Probably yes. > it was after a plain and simple reboot, i.e. no reconfigure prior to it, and > it's not causing a lot of issues. sometimes some lost inodes messages, but

Re: Data corruption on reboot

2025-04-22 Thread Attila Lendvai
if at boot i see `root: recovering journal` printed, does that mean that i'm also affected? it was after a plain and simple reboot, i.e. no reconfigure prior to it, and it's not causing a lot of issues. sometimes some lost inodes messages, but nothing that breaks anything. i tried to paste the

Re: Data corruption on reboot

2025-04-21 Thread Ludovic Courtès
Hello, Danny Milosavljevic writes: > Where in /gnu/store is the source for /etc/protocols ? I'd like to check > them for damage. It’s the ‘net-base’ package—see ‘operating-system-etc-service’. >>In the stop procedure of root-file-system shepherd service, located in >>(gnu services base) > > Th

Re: Data corruption on reboot

2025-04-21 Thread Rutherther
> > Note: sudo guix gc --verify did not find things. Upon this new information I don't think your store got corrupted, what got corrupted was files in /etc only, and it would've gotten fixed if you booted, but you didn't since the normal.mod was broken. > > Is it? I didn't have cp or cat, so no

Re: Data corruption on reboot

2025-04-21 Thread Rutherther
Hi, Danny Milosavljevic writes: > > Where in /gnu/store is the source for /etc/protocols ? I'd like to check > them for damage. Just look in the associated system generation (as was mentioned that is under /var/guix/profiles), under etc/protocols. To check damage in the store, run `guix gc -

Re: Data corruption on reboot

2025-04-21 Thread Danny Milosavljevic
Hi Rutherther, Danny Milosavljevic writes: >As for grub, yeah, there isn't much you can do other than booting >manually or to a live iso. While we are trying to find the cause, we should just make guix do a sync after each guix system reconfigure. Damaging user's system beyond repair-without-e

Re: Data corruption on reboot

2025-04-21 Thread Rutherther
Hi Danny, Danny Milosavljevic writes: > Hi, > > Guix system has been regularily not properly unmounting my filesystems > on shutdown or reboot. > > Usually, this manifested as a fsck because of sometimes dirty, > sometimes harmlessly damaged, root filesystem, on the next boot. > > But this tim