Hi Henrich, > To be clear, fscking does not correct the issue with /bsd? It's still > just several KB? Is this random, or do you have the ability to readily > reproduce it?
Correct. I looked into the issue more today and believe I've achieved 100% reproducibility. However, the problem is entirely unrelated to improper shutdown and all to do with the filling up of /usr. To reproduce, you want to *almost* fill up /usr before the kernel relink job runs (otherwise it may abort straight away). E.g. find out how much space left on disk, subtract a few MB then dd a dummy file to /usr. Reboot so the reorder will run. Shortly afterward when the relink is in progress the following is echoed to the console: uvn_flush: obj=<address>, offset=<address>. error during pageout. uvn_flush: WARNING: changes to page may be lost! The reorder will complete without error; however, a truncated/corrupted kernel will almost certainly be written to /bsd. Reboot and that system will be unbootable. The symptom may be slightly different from what I originally posted and sometimes manifests as a reboot loop (it gets part way through loading the kernel, aborts, then the bootloader restarts). This is repeatable to the point where if you boot into bsd.rd, move a working kernel into /bsd, recalculate the sha256 so reorder will run, without cleaning up space in /usr, the next time you reboot and reorder runs the cycle will repeat. I found some back and forth between Theo and Alex Bluhm a few years ago about the underlying cause of that message but didn't see anything about the reorder failing and not sure if anything got committed to fix it: https://marc.info/?l=openbsd-tech&m=164987816425987 As for the corrupted kernels, a good kernel had a file size of 31904467. Corrupted/truncated kernels written to /bsd had sizes of: 29931779, 31868587, 29935875 (I saw this one twice), and 11696. I think the sizes depend on how much free space was in /usr at the time it ran. More interestingly, the system is writing the hash of the bad kernel out into /var/db/kernel.SHA256 so I suspect there is a bug somewhere in the reorder process where no error is being returned due to disk full, it assumes the process completed successfully, and Bob's your uncle. Here is a copy of relink.log from a corrupted kernel: (SHA256) /bsd: OK LD="ld" sh makegap.sh 0xcccccccc gapdummy.o ld -T ld.script -X --warn-common -nopie -o newbsd ${SYSTEM_HEAD} vers.o ${OBJS} text data bss dec hex 26728562 488512 1351680 28568754 1b3ecb2 mv newbsd newbsd.gdb ctfstrip -S -o newbsd newbsd.gdb rm -f bsd.gdb mv -f newbsd bsd install -F -m 700 bsd /bsd && sha256 -h /var/db/kernel.SHA256 /bsd Kernel has been relinked and is active on next reboot. SHA256 (/bsd) = 75600b28045794fa983d0823435c3a07c276b13dbbf11dc01dca71a2d4fe8d6d The size of this corrupt kernel was 29935875 bytes. Regards Lloyd