Hello, Gentoo.

After having got the syslinux boot manager working well, I lost the root
partition on my newer machine.  I spent the entire evening yesterday
trying to get it back again, with various expedients for recovering ext4
partitions from backup superblocks, and so on.

It wasn't until the middle of the night that it dawned on me what had
happened, and I immediately got up and had it fixed within twenty
minutes.

The cause was me booting up the machine with a rescue disk.  This
assembled my RAID partitions /dev/md127 and /dev/md126 reversed, but
also wrote those wrong identifiers, 126 and 127, into the "preferred
minor" field of the partitions' super blocks.  In essence, they got
swapped.

Hence trying to boot up into my normal system, /dev/md126, the root
partition, was an unformatted empty space on the SSD.

I don't blame the rescue disk for this occurrence.  For some reason,
when the kernel assembles /dev/md devices, it only seems to pay
attention to the "preferred minor" fields when they are wrong.  :-(

mdadm appears to write the "preferred minor" fields at random when
assembling the RAID arrays.  I don't think it should, unless explicitly
asked.  There is an argument to mdadm which specifies the writing of
these fields.  In fact I used this to effect a repair, ironically
enough, from the rescue disk booted with the option to suppress the
automatic assembly of the arrays.

Just for the record, all my RAID arrays have metadata version 0.90, the
(old fashioned) one that allows auto-assembly by the kernel without the
need of an initramfs.

The moral of the story: if your system uses software RAID, be careful
indeed before you boot up with a rescue disk.

-- 
Alan Mackenzie (Nuremberg, Germany).

Reply via email to