I broke my TV. My TV is a monitor powered by a laptop running OpenBSD and in trying to diagnose a problem which turned out to be in the NAS I managed to fry the disklabel.
How? Well, being unimportant the machine is also the guinea pig for snapshot builds and other experiments so I thought I might try removing the problem (thus confirming it's to do with running a snapshot) by reverting to 7.2 stable. I copied bsd.rd from a stable machine, booted into that and installed over the top of the snapshot. The install worked fine but nothing much ran and / filled up with core dumps. Bummer. Oh well, easy enough to fix, just move the system files out of the way and reinstall. It's getting late but the OpenBSD installer's a breeze and it should be all done and config files merged in 15 minutes. Well it wasn't 15 minutes. I did the wrong thing and blasted the disklabel. The partitions weren't formatted. I knew this because I'd popped into the installer's shell and replaced /sbin/newfs with an empty file (what /bin/true used to be). The files were safe. After stepping back and using another machine to get the music going again I remembered that scan_ffs can find partitions when the disklabel is lost. Using that I can either recover it in full, or at least find /var with the backup. Well the numbers scan_ffs gave me were gibberish. The manual warns that it only looks for ffs1 partitions, not ffs2, but I ran it anyway and tried poking variations on the numbers it gave me into disklabel. That didn't work. In the end I opened up scan_ffs.c to look at how it does its scan. It proceeds in disk-block-sized chunks (512K) and applies each to a 'struct fs' as defined in /usr/include/ufs/ffs/fs.h. Unfortunately as the manual states it only considers ffs1 partitions, marked by FS_MAGIC aka FS_UFS1_MAGIC. While there's a FS_UFS2_MAGIC printing the location in which it was found didn't give me the result I expected... By this time, since I was figuring out scan_ffs and not looking for my missing /var, I was running it over a small disk image where I knew there was a partition at block 64, but the modifed scan_ffs said the first partition was on block 192. I thought that was strange but maybe there's a ffs1-like non-super-block which points to the real ffs2 block later on. Hoping this was the case I ran a scan over the whole disk printing each block that had a FS_UFS2_MAGIC signature, offset by the amount to feed to disklabel. Armed with a list of matching blocks (there were around 300 when I stopped scanning after I was confident /var was found or missed) I wrote a script to delete and recreate a partition beginning at each potential block (the length doesn't matter) and try to mount it (read only!). Since there were only a few blocks to check I scanned the output by eye, found /var and copied the disklabel backup out of it. With that it was a simple matter to restore the correct disklabel, check the partitions and recover the system in full. To obtain the list of blocks I added this clause after the main test in scan_ffs.c: else if (sb->fs_magic == FS_UFS2_MAGIC) { printf("ufs2 @ %lld\n", (blk*512+n)/512 - 128); This script fiddled with the disklabel to find a partition which worked (this changes the real disklabel): while read maybe; do echo 'd d\na d\n'$maybe'\n\n\nw\n' | disklabel -E sd0 >/dev/null 2>&1 echo -n "$maybe: " mount -r /dev/sd0d /mnt && ls /mnt /mnt/moved umount /mnt 2>/dev/null done There are certainly better ways. Restoring the disklabel is described in the manual: disklabel -R sd0 /tmp/disklabel.sd0.current The fully-integrated build system made testing changes to scan_ffs a breeze even though my dev box is on the snapshot and the recovery system was stable. Putting the snapshot back on the telly took much less than 15 minutes. It's possible scan_ffs could be simply extended to print potential ffs2 partitions (there are a few more checks it could make to whittle down the result) if the 128-block offset is constant across platforms. Cheers, Matthew