Hi, Le 11/09/2024, Andy Smith <a...@strugglers.net> a écrit:
> Since booting from sdb wasn't working in any case, I thought I'd > experiment a bit. I copied the first 446 bytes of sda to sdb. This > made matters worse! Instead of a "grub> " prompt, I just got a blank > screen. > > I then rebooted from sda and did: I believe “sda” and “sdb” are swapped with respect to your first message. Of course, it's expected that these are not stable across reboots, however it's a bit confusing for me here. (...) > This does leave me wondering however, if the boot code in the mBR of > sdb is now set to believe that this is "the second drive", I suppose > (hd1) in grub terms? With the implication that should sda fail or be > removed, this machine may still not boot because its boot code looks > for something on a drive that no longer exists (sdb now being (hd0))? I believe this is not necessary the case. I've tried to read some of the GRUB 2 stage 1 code from the grub2 2.12-5 package. I'm far from being able to claim I understand everything, but... let's see. My impression is that the “drive number” that is written to the MBR can be of two kinds: (a) an actual number, typically 0x80, 0x81, etc. for hard disks (it is the BIOS drive number for INT 13h, cf. [1]); (b) or the special value 0xFF (thus, the 128th hard disk is not available for case (a)—too bad if you have that many disks!). The special value 0xFF is the one you had on both of your drives and means “use the boot drive” (the one the BIOS booted from, whose number is in register DL when the BIOS transfers control to the MBR code loaded at physical address 0x7C00): (From grub2-2.12/grub-core/boot/i386/pc/boot.S) .org GRUB_BOOT_MACHINE_BOOT_DRIVE boot_drive: .byte 0xff /* the disk to load kernel from */ /* 0xff means use the boot drive */ (...) .org GRUB_BOOT_MACHINE_DRIVE_CHECK (...) ← fixup of DL in case in was incorrectly set by the BIOS /* * Check if we have a forced disk reference here */ movb boot_drive, %al cmpb $0xff, %al je 1f movb %al, %dl 1: /* save drive reference first thing! */ pushw %dx [ One may find it “interesting” that the “jmp 3f” from line 216 of boot.S may be overwritten by internal ”grub-setup” code (cf. “grub-bios-setup” in grub-install.c) from grub2-2.12/util/setup.c: boot_drive_check = (grub_uint8_t *) (boot_img + GRUB_BOOT_MACHINE_DRIVE_CHECK); (...) /* If DEST_DRIVE is a hard disk, enable the workaround, which is for buggy BIOSes which don't pass boot drive correctly. Instead, they pass 0x00 or 0x01 even when booted from 0x80. */ if (!allow_floppy && !grub_util_biosdisk_is_floppy (dest_dev->disk)) { /* Replace the jmp (2 bytes) with double nop's. */ boot_drive_check[0] = 0x90; boot_drive_check[1] = 0x90; } ] In your case, I pretend your MBR-stored drive config (from your first message for both drives) was “use the boot drive” for both sda and sdb, because from grub2-2.12/include/grub/i386/pc/boot.h: /* The offset of BOOT_DRIVE. */ #define GRUB_BOOT_MACHINE_BOOT_DRIVE 0x64 and both of your MBRs had 0xff at this offset: 00000060: 0000 0000 fffa 9090 f6c2 8074 05f6 c270 ...........t...p You can also see right here the two NOPs (9090) at offset GRUB_BOOT_MACHINE_DRIVE_CHECK (i.e. 0x66) which override the aforementioned “jmp 3f” from boot.S line 216, because this stage1 code was written to hard disks. Conclusion: in your case, the option was “load the next stage from the drive the BIOS booted from” for both MBRs. Therefore, AFAIUI, assuming everything else was good (incl. the offset for finding the next stage), it should still have been able to boot with only one of the drives present in the machine. > The grub.cfg itself (and later, the fstab) finds its drives by UUID > so I'm not worried about that part. > > I just have dim memories about having to do grub-install to sdb but > trick it somehow that this was (hd0)… Yep... AFAIUI, hd0 is for times when GRUB talks to the BIOS (at boot) and corresponds to 0x80 (on x86 machines), but when running grub-install or the internal grub-bios-setup, GRUB attempts to guess how the BIOS is going to number the devices you gave it in Linux-speak (/dev/sda, /dev/sdb, etc.), which may be unreliable. At least the GRUB 1.x documentation clearly said so according to my recollection, and therefore indicated (in the 2000s) as the bullet-proof recipe, to perform GRUB installation to hard disk *from GRUB itself* using the (hd0), (hd1), etc. notations, e.g. after booting from a GRUB floppy disk. > I do also wonder why my simple dd of the first 446 bytes did not > work, as the /boot partition is at the same position on both drives > and is an MDADM RAID1 so should have its stage2 at the same LBA. > After doing the "dpkg-reconfigure grub-pc" the first 446 bytes of > both sad and sdb are (still) identical so something else somewhere > else must have been changed. GRUB is a complex beast; available documentation may be a bit confusing when it comes to stage 1.5 and stage 2, e.g.[3]: Version 0 (GRUB Legacy) ~~~~~~~~~~~~~~~~~~~~~~~ Stage 1 can load stage 2 directly, but it is normally set up to load the stage 1.5., located in the first 30 KiB of hard disk immediately following the MBR and before the first partition. (...) The stage 1.5 image contains file system drivers, enabling it to directly load stage 2 from any known location in the filesystem, for example from /boot/grub. Version 2 (GRUB 2) ~~~~~~~~~~~~~~~~~~ [Different description] In any case, stage 1 can load some “stage 1.5” from “empty sectors (if available) between the MBR and the first partition”. These sectors wouldn't by synchronized by MD RAID, unless you're using it on the whole drives—as opposed to partition by partition. I don't claim that “this is it”, but this might explain some difference between your drives' booting behavior, even with identical: - stage1 code+data in the MBR; - boot partitions' start offset and contents. > Not understanding quite what is going on is worrying to me, even if > things do now work. 🙁 I just hope I didn't confuse you more. :-) Regards [1] https://en.wikipedia.org/wiki/INT_13H#List_of_INT_13h_services [2] https://wiki.osdev.org/MBR_(x86)#MBR_Bootstrap [3] https://en.wikipedia.org/wiki/GNU_GRUB -- Florent