Alexey Kardashevskiy <a...@ozlabs.ru> writes: > On 5/5/22 05:16, Fabiano Rosas wrote: >> Alexey Kardashevskiy <a...@ozlabs.ru> writes: >> >>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on. >>> >>> QEMU loads the kernel at 0x400000 by default which works most of >>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie" >>> (position independent code). This works for a little endian zImage too. >>> >>> However a big endian zImage is compiled without -pie, is 32bit, linked to >>> 0x4000000 so current QEMU ends up loading it at >>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails. >>> >>> This uses the kernel address returned from load_elf(). >>> If the default kernel_addr is used, there is no change in behavior (as >>> translate_kernel_address() takes care of this), which is: >>> LE/BE vmlinux and LE zImage boot, BE zImage does not. >>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU >>> prints a warning and BE zImage boots. >> >> I think we can fix this without needing a different command line for BE >> zImage (apart from x-vof, which is a separate matter). >> >> If you look at translate_kernel_address, it cannot really work when the >> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue, >> so if we fix that function like this... >> >> static uint64_t translate_kernel_address(void *opaque, uint64_t addr) >> { >> SpaprMachineState *spapr = opaque; >> >> return addr ? addr : spapr->kernel_addr; >> } > > > The qemu elf loader is supposed to handle relocations which should be > calling this hook more than once, now I wonder why it is not doing so.
For relocations, it seems to only call translate_fn for s390x. >> ...then we could always use the ELF PhysAddr if it is different from 0 >> and only use the default load addr if the ELF PhysAddr is 0. If the user >> gives kernel_addr on the cmdline, we honor that, even if puts the kernel >> over the firmware (we have code to detect that). > > > ELF address is 0 for LE zImage only, vmlinux BE/LE uses > 0xc000000000000000. And we are already chopping all these tops bits off > in translate_kernel_address() and I do not really know why _exactly_ it > is 0x0fffffff and not let's say 0x7fffffff. Oh, I am not talking about the ELF entry point. And that is not what load_elf passes to translate_kernel_address either. It is the ELF PhysAddr: $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align LOAD 0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 0x2a54ea8 RWE 0x10000 So it is 0x400000 for BE zImage and 0 for vmlinux. >> >>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState *machine) >>> exit(1); >>> } >>> >>> + if (spapr->kernel_addr != loaded_addr) { >> >> This could be: >> >> if (spapr->kernel_addr == KERNEL_LOAD_ADDR && >> spapr->kernel_addr != loaded_addr) { >> >> So the precedence would be: >> >> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage >> falls here; >> >> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall >> here; >> >> 3- kernel_addr. The user is probably hacking something, just use what >> they gave us. QEMU will yell if they load the kernel over the fw. > > > imho too complicated. > > What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is > KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)? Good point. It should always be 3. It is a bad user interface to have a command line option and arbitrarily ignore it. Be it 0x0 or 0x400000. > I am basically fixing a bug when we ignore where load_elf() loaded the > ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the > ELF was loaded where we want it to be. That bug is already accounted for, that is why we have translate_kernel_address: /* address_offset is hack for kernel images that are linked at the wrong physical address. */ if (translate_fn) { addr = translate_fn(translate_opaque, ph->p_paddr); So we're actively telling load_elf to load the kernel at the wrong place when we do: (ph->p_addr & 0x0fffffff) + spapr->kernel_addr It just happened to work so far because the vmlinux PhysAddr is 0. When you set kernel-addr=0 you are simply working around this other bug because: (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr I just want to remove this indirection and use the ELF PhysAddr directly. > Everything else can be done but on top of this. If you want to merge this I could send another patch on top of it later, no worries. I realise that this a larger change that will need more testing. >>> + warn_report("spapr: kernel_addr changed from 0x%lx to 0x%lx", >>> + spapr->kernel_addr, loaded_addr); >>> + spapr->kernel_addr = loaded_addr; >>> + } >>> + >>> /* load initrd */ >>> if (initrd_filename) { >>> /* Try to locate the initrd in the gap between the kernel