On Thu, Nov 13, 2025 at 11:57:00AM -0600, Tom Rini wrote: > On Thu, Nov 13, 2025 at 01:21:07PM +0100, Marek Vasut wrote: > > > Synchronize local copy of DTC with Linux 6.17 , using commits picked > > from Linux kernel. This also includes two fix up patches to make the > > DM core work with new 8-byte alignment checking in libfdt and another > > fix for NULL pointer check that is missing in libfdt. > > > > This depends on the following patches sent separately, which fix > > various 8-byte alignment problems in the code base: > > > > - boot: android: Always use 8-byte aligned DT with libfdt > > - test/py: android: Point fdt command to aligned addresses > > - test/py: Use aligned address for overlays in 'extension' test > > - sandbox: Fix DT compiler address warnings in sandbox DTs > > - sandbox: Fix DT compiler pin warnings in sandbox DTs > > - boot: Assure FDT is always at 8-byte aligned address > > - arm: qemu: Eliminate fdt_high and initrd_high misuse > > - efi_loader: Assure fitImage from capsule is used from 8-byte aligned > > address > > - MIPS: Assure end of U-Boot is at 8-byte aligned offset > > So, taking a look at the test branch you pointed me at, my big concern > is size growth. On imx8mp_dhcom_drc02 (where we're already LTO'ing), > with the CI gcc-14.2.0 toolchain full U-Boot grows by more than 6KiB and > SPL by a bit more than 2KiB. This is a bit of a worst-case, imx8mp_navqp > is a bit more than 3KiB / 548 bytes, with the average feeling like > ~4KiB/1KiB for aarch64.
I'm coming back to this to try and better understand things. And one problem here is that upstream dtc changes really trip up LTO. I made as a local hack, a change for imx8mp_dhcom_pdk2 to NOT use LTO (and so SPL fails to link, but it's about the same growth, given the change in overflows sram by numbers). This brought the size change down from ~6KiB to ~4KiB. Since this was already a hack just for investigation, I then started out with giving full U-Boot the "assume perfect dtb" mask. This reduces growth by 900 bytes. A better test case is pinephone because it's aarch64 but not LTO. And with a full mask in U-Boot hack, the size growth for full U-Boot is around 1000 bytes and 300 bytes in SPL. And so to me, there's a few questions. The first of which is, how is what's being done so terrible for LTO. It's not good for normal optimizations either, but it's really bad with LTO. The second of which is, is there something being done with how the sanity checks are performed that can be re-examined? Take fdt_get_string for example, which grows by 120 bytes without changing the mask at all, and the code changes are trivial switches to the new FDT_ASSUME mechanic and dropping extra parens. That shouldn't have size growth, I would expect. -- Tom
signature.asc
Description: PGP signature

