On Wed, Oct 18, 2023 at 01:33:20PM -0400, Sean Anderson wrote: > On 8/3/23 00:41, Heinrich Schuchardt wrote: > > On 8/3/23 03:31, Tom Rini wrote: > > > On Wed, Aug 02, 2023 at 09:13:21PM -0400, Sean Anderson wrote: > > > > On 8/2/23 16:11, Tom Rini wrote: > > > > > On Mon, Jul 31, 2023 at 06:42:52PM -0400, Sean Anderson wrote: > > > > > > > > > > > This series adds support for loading all image types (Legacy, FIT > > > > > > (with > > > > > > and without LOAD_FIT_FULL), and i.MX) to the MMC, SPI, NOR, NET, > > > > > > FAT, > > > > > > and EXT load methods. It does this by introducing a helper function > > > > > > which handles the minutiae of invoking the proper parsing function, > > > > > > and > > > > > > reading the rest of the image. > > > > > > > > > > > > Hopefully, this will make it easier for load methods to support all > > > > > > image types that U-Boot supports, without having undocumented > > > > > > unsupported image types. I applied this to several loaders which > > > > > > were > > > > > > invoking spl_load_simple_fit and/or spl_parse_image_header, but I > > > > > > did > > > > > > not use it with others (e.g. DFU/RAM) which had complications in the > > > > > > mix. > > > > > > > > > > > > In order to meet size requirements for some boards, the first two > > > > > > patches of this series are general size-reduction changes. The > > > > > > ffs/fls > > > > > > change in particular should reduce code size across the board. The > > > > > > malloc change also has the potential to reduce code size. I've left > > > > > > it > > > > > > disabled by default, but maybe we can reverse that in the future. > > > > > > > > > > > > Here's some bloat-o-meter for j7200_evm_a72_defconfig with ext4 > > > > > > support > > > > > > enabed: > > > > > > > > > > > add/remove: 2/1 grow/shrink: 1/5 up/down: 444/-584 (-140) > > > > > > Function old new delta > > > > > > spl_load - 216 +216 > > > > > spl_simple_read - 184 +184 > > > > > > spl_fit_read 60 104 +44 > > > > > > file_fat_read 40 - -40 > > > > > > spl_nor_load_image 120 76 -44 > > > > > > spl_load_image_ext 364 296 -68 > > > > > spl_load_image_fat 320 200 -120 > > > > > > spl_spi_load_image 304 176 -128 > > > > > > spl_mmc_load 716 532 -184 > > > > > > Total: Before=233618, After=233478, chg -0.06% > > > > > > > > > > > > For most boards with a few load methods, this series should break > > > > > > even. > > > > > > However, in the worst case this series will add around 100 bytes. > > > > > > > > > > > > I have only tested a few loaders. Please try booting your favorite > > > > > > board > > > > > > with NOR/SPI flash or SPI falcon mode. > > > > > > > > > > > On sama5d27_wlsom1_ek_mmc_defconfig, 100 bytes is too much, so this > > > > > > series depends on [1] to fit everything in. CI run at [2]. > > > > > > > > > > > > [1] > > > > > > https://lore.kernel.org/u-boot/20230731223327.109865-1-sean.ander...@seco.com/ > > > > > > [2] > > > > > > https://source.denx.de/u-boot/custodians/u-boot-clk/-/pipelines/17116 > > > > > > > > > > I've boot-tested this on my lab and a handful of SD+FAT or SD+raw boot > > > > > devices, and it's all worked. I also did my usual world build > > > > > before/after to check out the size changes and well, I don't know. > > > > > The > > > > > size wins come on platforms where there's both FAT+EXT support. And > > > > > then on mips with say mt7620_rfb I wonder if we're loosing > > > > > functionality. > > > > > > > > Yes we are. I'll address this in v6. > > > > > > > > > But most platforms grow a bit, especially if they're just > > > > > a single load type (which is the most common case). Maybe this problem > > > > > needs to be approached another way? I've put the whole log over at > > > > > https://gist.github.com/trini/e6772b2134e0eb44393364ea98729e06 and > > > > > maybe > > > > > something will jump out to you. > > > > > > > > The fls/ffs patch [1] reduces the growth by around 80 bytes on ARM > > > > platforms. I sent it separately for ease of review, but it really should > > > > be applied before this series, since a good portion of the growth is due > > > > to that one function call. It seems like MIPS also has this instruction > > > > (and Linux uses it), so I can try putting together a patch for it as > > > > well. > > > > > > > > In the future, I would like to convert bl_len to bl_shift or something, > > > > which would remove the need for fls (and going from bl_shift to bl_len > > > > is just a shift). spl_load_info is used in a lot of places, so I wanted > > > > to do that in a follow-up. On arches with efficient fls implementations > > > > the difference is only around 3 instructions or so. > > > > > > > > But fundamentally some of the problem is that the logic is being moved > > > > into multiple translation units, so the compiler can't see enough to > > > > make optimizations. For example, all filesystems use a bl_len of 1, so > > > > all the multiplications/roundings/etc. can be eliminated if the compiler > > > > can see that. For example, on arm64 copying spl_load into spl_fat.c (as > > > > fat_load) and making it static saves us 100 bytes: > > > > > > > > add/remove: 3/1 grow/shrink: 0/1 up/down: 324/-92 (232) > > > > Function old new delta > > > > spl_load - 160 +160 > > > > spl_simple_read - 104 +104 > > > > spl_fit_read - 60 +60 > > > > file_fat_read 40 - -40 > > > > spl_load_image_fat 252 200 -52 > > > > Total: Before=66833, After=67065, chg +0.35% > > > > > > > > add/remove: 2/1 grow/shrink: 1/0 up/down: 172/-40 (132) > > > > Function old new delta > > > > spl_simple_read - 104 +104 > > > > spl_fit_read - 60 +60 > > > > spl_load_image_fat 252 260 +8 > > > > file_fat_read 40 - -40 > > > > Total: Before=66833, After=66965, chg +0.20% > > > > > > > > and even then, by inspection spl_simple_read isn't really taking > > > > advantage of bl_len=1. Surprisingly, after enabling LTO this board is > > > > +400 bytes (compared to without this series). It's just hard to beat > > > > handwritten routines with a generic solution. > > > > > > > > I would really like to consolidate this stuff, since every load method > > > > ends up only supporting a few image types and having almost identical > > > > implementations for what they do support. > > > > > > > > --Sean > > > > > > > > [1] > > > > https://lore.kernel.org/u-boot/20230731212734.7256-1-sean.ander...@seco.com/ > > > > > > Thanks for digging more, lets keep going in this direction then. > > > > > > > Do we need to pass sectors to the info->load() functions? > > > > Only nand and mmc use bl_len != 1. Can we move the conversion to sectors > > and the alignment checks to functions called by info->load()? > > > > I would assume that this can avoid part of the code size increase. > > Unfortunately, while the conversion to sectors and alignment can be put into > ->load, > MMC (and ROMAPI) can only read in block-sized chunks. This means that the > buffer must > be a multiple of the block size (otherwise we will go over the end), and > there must be > a block's worth of space in front of the buffer as well (which we use when we > have a > non-block-aligned offset). > > These situations generally only matter when reading into a malloc'd buffer, > since there > is data to overwrite on either end of the buffer. With things like > SYS_TEXT_BASE, there > is nothing to overwrite on either end. From examining the various loaders, > malloc'd buffers > are only used when loading the image header (which may include the full image > for FITs). > In that situation, under the beginning of the buffer is not an issue, since > the load method > should have supplied an offset which is a multiple of the block size. Thus, > we only need > to solve the problem of writing past the end of the buffer. > > From my perspective, there are three approaches we could take: > > - Keep bl_len so that callers of ->read can allocate a sufficiently-large > buffer. > - Use a bounce-buffer. This uses around double the memory and will also > increase code size. > - Complete most of the read as normal, but read the final block into a > separate buffer which > is then copied into the passed-in buffer. This requires two reads, which is > likely slower > than one. It will also increase code size. > > I think the first approach could be improved by optionally removing bl_len > when there are > no load methods which need it. This will let the compiler optimize out any > alignment if it > is unnecessary, which it cannot currently do (unless LTO is enabled). I am > going to try using > this approach for v6.
This sounds good, and LTO is / can be an option for SPL for platforms that need to regain space. -- Tom
signature.asc
Description: PGP signature