Hi again,

"Rutherther" <ruthert...@ditigal.xyz> writes:

>
> If you reconfigure and then reboot right after, I think it's feasible
> some of the data might not be written to the disk yet, and this is the
> way most folks do it.

So, I was trying out in a VM because of the issue #77086 and
I actually have got this empty derivation error during that.

And I am pretty confident the cause for me has been what I decribed
here:
1. reconfigure successfully completes, nothing is corrupted
2. user reboots
3. the reboot doesn't properly commit data to the disk because root file
system is not unmounted cleanly
4. the user boots, gets fsck and the files are corrupted, from step 3

How I made sure of this is: I have a qcow image and I started two times
from the same image, both times first reconfiguring to same
configuration.scm file. After the reconfigure, the steps differ

First time: reboot, got error with empty derivations and I can no longer
reconfigure, because what was corrupted is kexec reboot derivation.
Second time: guix gc --verify=contents, reporting no issues, reboot then.

The second time I didn't get these errors after booting again,
and I am pretty confident that is because of so many reads and time it
took to make them, the data were commited already.

I am able to consistently, with the same image and roughly same
timing (doing it manually) replicate the same issue multiple times.
Specifically I am getting corrupted
kexec-load-system derivation. But I don't think that matters much,
it's just something that has been written recently. What it will be
depends on the fs, on the timing, on other operations to the disk etc.

Conclusion: I see it as more likely you guys were affected by bug
#77086 (root fs is not cleanly unmounted), not that guix would
mysteriously write zeros.

If you see it otherwise, please let me know what led you to believe
otherwise.

Regards,
Rutherther



Reply via email to