Am Montag, 27. März 2017, 23:16:45 CEST schrieb Alexander Graf: > > On 27/03/2017 17:17, Heiko Stuebner wrote: > > Am Montag, 27. März 2017, 09:14:47 CEST schrieb Alexander Graf: > >> > >> On 27/03/2017 01:38, Simon Glass wrote: > >>> Most of the time the optimised memset() is what we want. For extreme > >>> situations such as TPL it may be too large. For example on the 'rock' > >>> board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and > >>> the rodata bug, this patch is enough to reduce the TPL image below the > >>> limit. > >>> > >>> Signed-off-by: Simon Glass <s...@chromium.org> > >>> --- > >>> > >>> lib/Kconfig | 9 +++++++++ > >>> lib/string.c | 6 ++++-- > >>> 2 files changed, 13 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/lib/Kconfig b/lib/Kconfig > >>> index 65c01573e1..5bf512d8c0 100644 > >>> --- a/lib/Kconfig > >>> +++ b/lib/Kconfig > >>> @@ -52,6 +52,15 @@ config LIB_RAND > >>> help > >>> This library provides pseudo-random number generator functions. > >>> > >>> +config FAST_MEMSET > >>> + bool "Use an optimised memset()" > >>> + default y > >>> + help > >>> + The faster memset() is the arch-specific one (if available) enabled > >>> + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get > >>> + better performance by write a word at a time. Disable this option > >>> + to reduce code size slightly at the cost of some speed. > >> > >> The comment sounds slightly confused - it took me a few times of reading > >> it until I grasped what it was trying to tell me :). > >> > >>> + > >>> source lib/dhry/Kconfig > >>> > >>> source lib/rsa/Kconfig > >>> diff --git a/lib/string.c b/lib/string.c > >>> index 67d5f6a421..159493ed17 100644 > >>> --- a/lib/string.c > >>> +++ b/lib/string.c > >>> @@ -437,8 +437,10 @@ char *strswab(const char *s) > >>> void * memset(void * s,int c,size_t count) > >>> { > >>> unsigned long *sl = (unsigned long *) s; > >>> - unsigned long cl = 0; > >>> char *s8; > >>> + > >>> +#ifdef CONFIG_FAST_MEMSET > >>> + unsigned long cl = 0; > >>> int i; > >>> > >>> /* do it one word at a time (32 bits or 64 bits) while possible */ > >>> @@ -452,7 +454,7 @@ void * memset(void * s,int c,size_t count) > >>> count -= sizeof(*sl); > >>> } > >>> } > >>> - /* fill 8 bits at a time */ > >>> +#endif /* fill 8 bits at a time */ > >> > >> So while this is all neat, a few ideas: > >> > >> 1) Would having memset in a header improve things even more? After all, > >> each external function call clobbers registers that you need to > >> save/restore... > > > > I'd guess it really depends on the size constraints. The regular > > libgeneric memset compiles on my rk3188 tpl to a total of > > 64bytes on both gcc-4.9 and gcc-6.3 while Simon's fast-memset > > comes down to 14bytes on my rk3188. > > > > On the rk3188 the only memset user is board_init_f, so here memset > > is called only once without needing to save registers and I'd guess if an > > implementation really is that size-constrained to worry about 50bytes > > this one caller will probably always be the only one? > > I'm not sure I follow. If you put it into a header, the compiler has a > better chance of evicting untaken code paths and optimize register usage > over object linked variants (unless you use GOLD). I was mostly > wondering whether that would already give you the savings without > introducing a complicated #ifdef that is going to bitrot over time :).
On rk3188-tpl that small non-fast memset gets compiled to (bfd linker): 100809aa <board_init_f_init_reserve>: 100809aa: b510 push {r4, lr} 100809ac: 22c0 movs r2, #192 ; 0xc0 100809ae: 2100 movs r1, #0 100809b0: 4604 mov r4, r0 100809b2: f000 f804 bl 100809be <memset> 100809b6: 34c0 adds r4, #192 ; 0xc0 100809b8: f8c9 4090 str.w r4, [r9, #144] ; 0x90 100809bc: bd10 pop {r4, pc} 100809be <memset>: 100809be: 4402 add r2, r0 100809c0: 4603 mov r3, r0 100809c2: 4293 cmp r3, r2 100809c4: d100 bne.n 100809c8 <memset+0xa> 100809c6: 4770 bx lr 100809c8: f803 1b01 strb.w r1, [r3], #1 100809cc: e7f9 b.n 100809c2 <memset+0x4> not saving any outside registers, as it's used only once at all and what I was trying to say was that in cases where we worry about having the tiniest memset possible, I guess that will most likely stay the only call. But I may have been dug into the rk3188 tpl-specifics to long, to see other possible cases right now :-) . > I'm just slightly worried about the massive number of preprocessor > excludes that happen in U-Boot in general. It seems like something > that's really hard to ever have full testing coverage on. That's essentially what I was worried about as well, seeing that memset can be provided by different sources it seems. There is the libgeneric memset we're having here and also the arch- specific memset (way faster but also again way bigger) and without using either, one could also provide some completely separate implementation at the moment. So having one version in a header would probably also incur some sort of ifdef voodoo? > >> 2) How much would GOLD save you? Have you tried? U-Boot is small enough > >> of a code base that global optimizations should be able to give > >> significant size savings. > > > > I think the issue that this is trying to solve is to allow more > > toolchains to be used and thus make rebuilds on changes work on a lot > > of boards at the same time with random toolchains. > > > > gcc-6.3 already produces way smaller results (well within the size > > constraints the rk3188 has) than for example the gcc-4.9 used by > > buildman as baseline toolchain. > > Ah, I see. So 4.9 does not have -lto? There's a good chance my gut > feeling that GOLD actually saves anything is wrong - I don't know. Has > anyone done the numbers? Then we would have something to actually base > gut feeling on. It looks like the u-boot Makefile makes explicitly sure to use GNU ld. So I didn't try to dig deeper into this :-) . > Size is always a serious constraint in U-Boot, especially in SPL > environments. If we can include one more tool in our portfolio to > optimize size across the board, I'm all for it. This patch just feels > slightly short-term - but I'm definitely not nack'ing it :). Heiko _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot