On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt <heinrich.schucha...@canonical.com> wrote: > > > > Sughosh Ganu <sughosh.g...@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: >> >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt >> <heinrich.schucha...@canonical.com> wrote: >> > >> > On 5/6/25 11:24, Sughosh Ganu wrote: >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set >> > > the width of the phys_{addr,size}_t data types based on the register >> > > size of the architecture. >> > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit >> > > platforms, where the upper 32-bits of the variables of these types >> > > can have junk data, and that can cause all kinds of side-effects. >> > >> > How could it be that the upper 32-bit have junk data? >> > >> > When we convert from a shorter variable the compiler should fill the >> > upper bits with zero. >> >> That does not seem to be happening. The efi_fit test fails on the >> qemu-riscv32 platform, when attempting to boot the OS from the FIT >> image. >> >> These are the values of the base address that I see in the >> _lmb_alloc_addr() function. >> >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > As you are running on QEMU you should be able to track down where the value > is actually assigned with gdb. This could for instance be a buffer overrun.
I was able to hook up gdb and re-create the issue. What I observe is that when the lmb_allocate_mem() function is called, the base address parameter, which is 64-bits, shows a value with the upper 32-bits not zeroed out. So, this looks like a compiler issue, where the upper 32-bits are not being zeroed out. Fwiw, this shows up with the compiler being used in the CI environment, as well as the one that I am using. /toolchains/riscv32-linux/bin/riscv32-linux-gcc --version riscv32-linux-gcc (GCC) 13.2.0 /toolchains/riscv32-linux/bin/riscv32-linux-ld --version GNU ld (GNU Binutils) 2.41 Given that LPAE is not being used on riscv32 platforms in U-Boot, I think this patch should be considered. Thanks. -sughosh > > If we don't correct the root cause, it will hit us again. > > Best regards > > Heinrich > >> >> The actual value that gets passed is 0x802000bc. The upper 32-bits do >> not get zeroed out. Which causes the _lmb_alloc_addr() to return an >> error. You can check my branch [1], where I have put a temporary >> commit to print the values that cause the issue. >> >> -sughosh >> >> [1] - >> https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 >> >> > >> > > >> > > This was discovered on the qemu Riscv 32-bit platform when the return >> > > value of an LMB API was checked, and some LMB allocation that ought >> > > not to have failed, was failing. The upper 32-bits of the address >> > > variable contained garbage, resulting in failures. >> > > >> > > Signed-off-by: Sughosh Ganu <sughosh.g...@linaro.org> >> > > --- >> > > >> > > Note: >> > > Although the LMB API cleanup series depends on this patch, I am >> > > sending it separately so that it gets noticed by the RiscV >> > > maintainers. Sometimes a patch may not get the required attention when >> > > sent as part of another seemingly unrelated series. >> > > >> > > >> > > arch/riscv/include/asm/types.h | 9 +++++++-- >> > > 1 file changed, 7 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/arch/riscv/include/asm/types.h >> > > b/arch/riscv/include/asm/types.h >> > > index 49f7a5d6b3a..45d806c83eb 100644 >> > > --- a/arch/riscv/include/asm/types.h >> > > +++ b/arch/riscv/include/asm/types.h >> > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; >> > > typedef u32 dma_addr_t; >> > > #endif >> > > >> > > -typedef unsigned long long phys_addr_t; >> > > -typedef unsigned long long phys_size_t; >> > > +#ifdef CONFIG_PHYS_64BIT >> > > +typedef u64 phys_addr_t; >> > > +typedef u64 phys_size_t; >> > > +#else >> > > +typedef u32 phys_addr_t; >> > > +typedef u32 phys_size_t; >> > >> > This matches what has been done for ARM. >> > >> > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") >> > changed the definition to 64bit phys addr_t to support the 34bit >> > physical addresses (similar to LPAE on arm32). >> > >> > I don't think that we need 34bit support currently. But I would prefer >> > if we could find the root cause why the upper 32bit gets messed up as >> > this might point to a generic problem. >> > >> > Best regards >> > >> > Heinrich >> > >> > > +#endif >> > > >> > > #endif /* __KERNEL__ */ >> > > >> >