Re: [PATCH qemu] spapr: Use address from elf parser for kernel address

Fabiano Rosas Wed, 11 May 2022 10:50:05 -0700

Alexey Kardashevskiy <a...@ozlabs.ru> writes:

> On 06/05/2022 01:50, Fabiano Rosas wrote:
>> Alexey Kardashevskiy <a...@ozlabs.ru> writes:
>> 
>>> On 5/5/22 05:16, Fabiano Rosas wrote:
>>>> Alexey Kardashevskiy <a...@ozlabs.ru> writes:
>>>>
>>>>> tl;dr: This allows Big Endian zImage booting via -kernel + x-vof=on.
>>>>>
>>>>> QEMU loads the kernel at 0x400000 by default which works most of
>>>>> the time as Linux kernels are relocatable, 64bit and compiled with "-pie"
>>>>> (position independent code). This works for a little endian zImage too.
>>>>>
>>>>> However a big endian zImage is compiled without -pie, is 32bit, linked to
>>>>> 0x4000000 so current QEMU ends up loading it at
>>>>> 0x4400000 but keeps spapr->kernel_addr unchanged so booting fails.
>>>>>
>>>>> This uses the kernel address returned from load_elf().
>>>>> If the default kernel_addr is used, there is no change in behavior (as
>>>>> translate_kernel_address() takes care of this), which is:
>>>>> LE/BE vmlinux and LE zImage boot, BE zImage does not.
>>>>> If the VM created with "-machine kernel-addr=0,x-vof=on", then QEMU
>>>>> prints a warning and BE zImage boots.
>>>>
>>>> I think we can fix this without needing a different command line for BE
>>>> zImage (apart from x-vof, which is a separate matter).
>>>>
>>>> If you look at translate_kernel_address, it cannot really work when the
>>>> ELF PhysAddr is != 0. We would always hit this sort of 0x4400000 issue,
>>>> so if we fix that function like this...
>>>>
>>>> static uint64_t translate_kernel_address(void *opaque, uint64_t addr)
>>>> {
>>>>       SpaprMachineState *spapr = opaque;
>>>>
>>>>       return addr ? addr : spapr->kernel_addr;
>>>> }
>>>
>>>
>>> The qemu elf loader is supposed to handle relocations which should be
>>> calling this hook more than once, now I wonder why it is not doing so.
>> 
>> For relocations, it seems to only call translate_fn for s390x.
>
>
> Agrh. You made me read this :) It is quite weird code and yeah 390-only :-/
>
>
>>>> ...then we could always use the ELF PhysAddr if it is different from 0
>>>> and only use the default load addr if the ELF PhysAddr is 0. If the user
>>>> gives kernel_addr on the cmdline, we honor that, even if puts the kernel
>>>> over the firmware (we have code to detect that).
>>>
>>>
>>> ELF address is 0 for LE zImage only, vmlinux BE/LE uses
>>> 0xc000000000000000. And we are already chopping all these tops bits off
>>> in translate_kernel_address() and I do not really know why _exactly_ it
>>> is 0x0fffffff and not let's say 0x7fffffff.
>> 
>> Oh, I am not talking about the ELF entry point. And that is not what
>> load_elf passes to translate_kernel_address either. It is the ELF
>> PhysAddr:
>> 
>> $ powerpc64le-buildroot-linux-gnu-readelf -We vmlinux | tail
>> 
>> Program Headers:
>>    Type           Offset   VirtAddr           PhysAddr           FileSiz  
>> MemSiz   Flg Align
>>    LOAD           0x010000 0xc000000000000000 0x0000000000000000 0x28d84d4 
>> 0x2a54ea8 RWE 0x10000
>> 
>> So it is 0x400000 for BE zImage and 0 for vmlinux.
>
> Ah right. Me wrong.
>
> btw potentially there can be more program sections.
>
> [fstn1-p1 ~]$ readelf -l /home/aik/p/slof/board-qemu/llfw/stage1.elf
>
> Elf file type is EXEC (Executable file)
> Entry point 0x100
> There are 2 program headers, starting at offset 64
>
> Program Headers:
>    Type           Offset             VirtAddr           PhysAddr
>                   FileSiz            MemSiz              Flags  Align
>    LOAD           0x0000000000000100 0x0000000000000100 0x0000000000000100
>                   0x0000000000008110 0x0000000000010290  RWE    0x4000
>    LOAD           0x0000000000008210 0x0000000000010400 0x0000000000010400
>                   0x0000000000000010 0x0000000000000010  RW     0x8
>
>
> grub might be similar.
>
>
>> 
>>>>
>>>>> @@ -2988,6 +2990,12 @@ static void spapr_machine_init(MachineState 
>>>>> *machine)
>>>>>                exit(1);
>>>>>            }
>>>>>    
>>>>> +        if (spapr->kernel_addr != loaded_addr) {
>>>>
>>>> This could be:
>>>>
>>>>           if (spapr->kernel_addr == KERNEL_LOAD_ADDR &&
>>>>        spapr->kernel_addr != loaded_addr) {
>>>>
>>>> So the precedence would be:
>>>>
>>>> 1- ELF PhysAddr, if != 0. After all, that is what it's for. BE zImage
>>>>      falls here;
>>>>       
>>>> 2- KERNEL_LOAD_ADDR. Via translate_kernel_address, LE/BE vmlinux fall
>>>>      here;
>>>>
>>>> 3- kernel_addr. The user is probably hacking something, just use what
>>>>      they gave us. QEMU will yell if they load the kernel over the fw.
>>>
>>>
>>> imho too complicated.
>>>
>>> What if the user runs QEMU with kernel-addr=0x400000? (0x400000 is
>>> KERNEL_LOAD_ADDR noooow but not necessarily forever). Is it 2) or 3)?
>> 
>> Good point. It should always be 3. It is a bad user interface to have a
>> command line option and arbitrarily ignore it. Be it 0x0 or 0x400000.
>> 
>>> I am basically fixing a bug when we ignore where load_elf() loaded the
>>> ELF and just assume it is KERNEL_LOAD_ADDR. Now the code checks if the
>>> ELF was loaded where we want it to be.
>> 
>> That bug is already accounted for, that is why we have
>> translate_kernel_address:
>> 
>>    /* address_offset is hack for kernel images that are
>>       linked at the wrong physical address.  */
>>    if (translate_fn) {
>>        addr = translate_fn(translate_opaque, ph->p_paddr);
>> 
>> So we're actively telling load_elf to load the kernel at the wrong place
>> when we do:
>> 
>> (ph->p_addr & 0x0fffffff) + spapr->kernel_addr
>> 
>> It just happened to work so far because the vmlinux PhysAddr is 0.
>> 
>> When you set kernel-addr=0 you are simply working around this other bug
>> because:
>> 
>> (ph->p_addr & 0x0fffffff) + 0 == ph->p_addr
>> 
>> I just want to remove this indirection and use the ELF PhysAddr
>> directly.
>
>
> That's alright but not in translate_kernel_address(). May be I should 
> rename kernel-addr to kernel-offset (which it really is)  or  hack 
> load_elf() so it would take the desired location and work out the offset 
> inside (and ditch the translate_fn hook) but either way we'll need 
> heuristics (or the user should know) to know if the image is 
> self-relocatable or not as otherwise it just won't boot.
>
>
>
>>> Everything else can be done but on top of this.
>> 
>> If you want to merge this I could send another patch on top of it later,
>> no worries. I realise that this a larger change that will need more
>> testing.
>
>
> I want to have some easy-to-explain way of booting BE zImage :)


Ok, I guess I can live with this kernel-addr=0 ugliness. We can deal
with translate_kernel_address another time.

Reviewed-by: Fabiano Rosas <faro...@linux.ibm.com>

Re: [PATCH qemu] spapr: Use address from elf parser for kernel address

Reply via email to