Hello GRUB community, I'm Guilherme Piccoli from Canonical, this is my first time posting on GRUB mailing-list/bug tracker. This is both a bug report but also an email to grub-devel ML; the goal of such duplication was to either loop more people, but also to rely on a better "archiving" of such discussion, since searching in grub-devel (through its archives) seems much more straightforward than searching GRUB bugs (at least, that was my experience).
I've been debugging an issue related to the following GRUB message: "error: attempt read-write outside of disk `hd0` ". After studying the code and doing some instrumentation, I've noticed this comes from a bad calculation/reading of disk total sectors. The whole problem started with a very odd problem reported by a Canonical customer, in which GRUB wasn't able to read some kernel images/initrd in /boot. But in the end, we were able to reproduce and understand the issue as this bad total_sectors reading, and going further in the debug, I found that for x86 legacy-mode booting, GRUB collects such information by using service 48h/int 13h [0], in grub_biosdisk_get_diskinfo_real(). I've then instrumented such function by dumping the contents of struct grub_biosdisk_drp in an HP ProLiant DL360 machine, collecting the following output: size=4a, flags=0 cyl=ffff, heads=ff, sec=20 bytesp_s=200, total_secs=d1bda5b0 The value 0xd1bda5b0 is a 32-bit value, although the total_sectors struct variable is 64-bit - so it seems at first sight that it returned valid lower 32 bits, but invalid/NULL upper 32 bits. To check that theory, I was able to boot Linux kernel using its 16-bit realmode entry point, and with that, I could use its EDD module to perform the same BIOS service 48h query, and this was the result collected by Linux (a dmesg print I've added): [4.027450] edd[0]->total_secs=1d1bda5b0 According to SCSI Read Capacity 16, the value that was read by kernel EDD is correct [1]. So...this reinforces the theory that somewhat GRUB seems to be "ignoring"/NULLifying the upper 32 bits of the total_sectors variable. When checking EDD code [2], we can see that the buffer in which the BIOS fills data from the service 48h request has its address set on SI register. In GRUB code, we have the pointer "broken" in two portions, as observed in function grub_biosdisk_get_diskinfo_real(): /* compute the address of drive parameters */ regs.esi = ((grub_addr_t) drp) & 0xf; regs.ds = ((grub_addr_t) drp) >> 4; I'm not experienced enough in memory segmentation to be sure if the address split is correct...but seems the values are fine except that the 32 upper bits of total_sectors are 0. Any guidance on that would be much appreciated. Thanks in advance, Guilherme [0] https://en.wikipedia.org/wiki/INT_13H#INT_13h_AH=48h:_Extended_Read_Drive_Parameters [1] $ sg_readcap /dev/sda READ CAPACITY (10) indicates device capacity too large now trying 16 byte cdb variant Read Capacity results: Protection: prot_en=0, p_type=0, p_i_exponent=0 Logical block provisioning: lbpme=0, lbprz=0 Last LBA=7813834159 (0x1d1bda5af), Number of logical blocks=7813834160 [== 0x1d1bda5b0] [...] [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/edd.c#n101 _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel