Most of the time the optimised memset() is what we want. For extreme situations such as TPL it may be too large. For example on the 'rock' board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and the rodata bug, this patch is enough to reduce the TPL image below the limit.
Signed-off-by: Simon Glass <s...@chromium.org> Signed-off-by: Heiko Stuebner <he...@sntech.de> --- Hi Simon, a bit bikesheddy, but might it make more sense to structure the options like below? That way it matches USE_ARCH_MEMSET and might make the intent visible better, as you get USE_ARCH_MEMSET=y = biggest but also fastest (nothing) = default from libgeneric USE_TINY_MEMSET=y = optimize for size over speed Also might make reading defconfigs easier as you would have CONFIG_USE_TINY_MEMSET=y instead of # CONFIG_FAST_MEMSET is not set when needing that option. Anyway, I've tested both variants on a live rk3188-rock now and everything of course still works, even when build with gcc-4.9, so both variants also Tested-by: Heiko Stuebner <he...@sntech.de> Heiko lib/Kconfig | 20 ++++++++++++++++++++ lib/string.c | 5 ++++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/lib/Kconfig b/lib/Kconfig index 65c01573e1..ab42413839 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -52,6 +52,26 @@ config LIB_RAND help This library provides pseudo-random number generator functions. +config USE_TINY_MEMSET + bool "Use a size-optimized memset()" + help + This makes memset prefer code size over speed optimizations. + The fastest memset() is the arch-specific one (if available) enabled + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get + better performance by writing a word at a time at the cost of + slightly bigger memset code, but in some special cases size might + be more important than speed. + +config SPL_USE_TINY_MEMSET + bool "Use a size-optimized memset()" + help + This makes memset prefer code size over speed optimizations. + The fastest memset() is the arch-specific one (if available) enabled + by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get + better performance by writing a word at a time at the cost of + slightly bigger memset code, but in some special cases size might + be more important than speed. + source lib/dhry/Kconfig source lib/rsa/Kconfig diff --git a/lib/string.c b/lib/string.c index 67d5f6a421..edae997fa6 100644 --- a/lib/string.c +++ b/lib/string.c @@ -437,8 +437,10 @@ char *strswab(const char *s) void * memset(void * s,int c,size_t count) { unsigned long *sl = (unsigned long *) s; - unsigned long cl = 0; char *s8; + +#if !CONFIG_IS_ENABLED(USE_TINY_MEMSET) + unsigned long cl = 0; int i; /* do it one word at a time (32 bits or 64 bits) while possible */ @@ -452,6 +454,7 @@ void * memset(void * s,int c,size_t count) count -= sizeof(*sl); } } +#endif /* fill 8 bits at a time */ s8 = (char *)sl; while (count--) -- 2.11.0 _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot