On 02/09/2020 16:25, Edgar E. Iglesias wrote: > On Wed, Sep 02, 2020 at 04:18:48PM +0100, Andr� Przywara wrote: >> On 02/09/2020 15:53, Edgar E. Iglesias wrote: >>> On Wed, Sep 02, 2020 at 03:43:08PM +0100, Andr� Przywara wrote: >>>> On 02/09/2020 12:15, Michal Simek wrote: >> >> Hi, >> >>>> >>>>> From: "Edgar E. Iglesias" <edgar.igles...@xilinx.com> >>>>> >>>>> When U-Boot binary exceeds 1MB with CONFIG_POSITION_INDEPENDENT=y >>>>> compilation error is shown: >>>>> /mnt/disk/u-boot/arch/arm/cpu/armv8/start.S:71:(.text+0x3c): relocation >>>>> truncated to fit: R_AARCH64_ADR_PREL_LO21 against symbol `__rel_dyn_end' >>>>> defined in .bss_start section in u-boot. >>>>> >>>>> It is caused by adr instruction which permits the calculation of any byte >>>>> address within +- 1MB of the current PC. >>>>> Because U-Boot is bigger then 1MB calculation is failing. >>>>> >>>>> The patch is using adrp/add instructions where adrp shifts a signed, >>>>> 21-bit >>>>> immediate left by 12 bits (4k page), adds it to the value of the program >>>>> counter with the bottom 12 bits cleared to zero. Then add instruction >>>>> provides the lower 12 bits which is offset within 4k page. >>>>> These two instructions together compose full 32bit offset which should be >>>>> more then enough to cover the whole u-boot size. >>>>> >>>>> Signed-off-by: Edgar E. Iglesias <edgar.igles...@xilinx.com> >>>>> Signed-off-by: Michal Simek <michal.si...@xilinx.com> >>>> >>>> It's a bit scary that you need more than 1MB, but indeed what you do >>>> below is the canonical pattern to get the full range of PC relative >>>> addressing (this is used heavily in Trusted Firmware, for instance). >>>> >>>> The only thing to keep in mind is that this assumes that the load >>>> address of the binary is 4K aligned, so that the low 12 bits of the >>>> symbol stay the same. I wonder if we should enforce this somehow? But >>>> the load address is not controlled by the build process (the whole >>>> purpose of PIE), so that's not doable just in the build system? >>> >>> There shouldn't be any need for 4K alignment. Could you elaborate on >>> why you think there is? >> >> That seems to be slightly tricky, and I tried to get some confirmation, >> but here goes my reasoning. Maybe you can confirm this: >> >> - adrp takes the relative offset, but only of the upper 20 bits (because >> that's all we can encode). It clears the lower 12 bits of the register. >> - the "add" is not PC relative anymore, so it just takes the lower 12 >> bits of the "absolute" linker symbol. > > I was under the impression that this would use a PC-relative lower 12bit > relocation but you are correct. I dissasembled the result: > > 40: 91000042 add x2, x2, #0x0 > 40: R_AARCH64_ADD_ABS_LO12_NC __rel_dyn_start > > > > > >> So this assumes that the lower 12 bits of the actual address in memory >> and the lower 12 bits of the linker's view match. >> An example: >> 00024: adrp x0, SYMBOL >> 00028: add x0, x0, :lo12:SYMBOL >> >> SYMBOL: >> 42058: ... >> >> The toolchain will generate: >> adrp x0, #0x42; add x0, x0, #0x058 >> >> Now you load the code to 0x8000.0800 (NOT 4K aligned). SYMBOL is now at >> 0x80042858. >> The adrp will use the PC (0x8000.0824) & ~0xfff + offs => 0x8004.2000. >> The add will just add 0x58, so you end up with x0 being 0x80042058, >> which is not the right address. >> >> Does this make sense? > > > Yes, it makes sense. > >> >>> Perhaps the commit message is a little confusing. The toolchain will >>> compute the pc-relative offset from this particular location to the >>> symbol and apply the relocations accordingly. >> >> Yes, but the PC relative offset applies only to the upper 20 bits, >> because it's only adrp that has PC relative semantics. >> >> >>>> >>>> Shall we at least document this? I guess typical load address are >>>> actually quite well aligned, so it might not be an issue in practice. >>>> > > Yes, probably worth documenting and perhaps an early bail-out if it's not > the case...
Documenting sounds good, Kconfig might be a good place, as Michal suggested. Bail out: I thought about that, it's very easy to detect at runtime, but what then? This is really early, so you could just enter a WFI loop, and hope for someone to connect the dots? Or can you think of any other way of communicating with the user? Cheers, Andre