Most of the time the optimised memset() is what we want. For extreme
situations such as TPL it may be too large. For example on the 'rock'
board, using a simple loop saves a useful 48 bytes. With gcc 4.9 and
the rodata bug, this patch is enough to reduce the TPL image below the
limit.

Signed-off-by: Simon Glass <s...@chromium.org>
Signed-off-by: Heiko Stuebner <he...@sntech.de>
---
Hi Simon,

a bit bikesheddy, but might it make more sense to structure the
options like below? That way it matches USE_ARCH_MEMSET and might
make the intent visible better, as you get
USE_ARCH_MEMSET=y = biggest but also fastest
(nothing) = default from libgeneric
USE_TINY_MEMSET=y = optimize for size over speed

Also might make reading defconfigs easier as you would have
    CONFIG_USE_TINY_MEMSET=y
instead of
    # CONFIG_FAST_MEMSET is not set
when needing that option.

Anyway, I've tested both variants on a live rk3188-rock now and
everything of course still works, even when build with gcc-4.9, so
both variants also
Tested-by: Heiko Stuebner <he...@sntech.de>


Heiko


 lib/Kconfig  | 20 ++++++++++++++++++++
 lib/string.c |  5 ++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/lib/Kconfig b/lib/Kconfig
index 65c01573e1..ab42413839 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -52,6 +52,26 @@ config LIB_RAND
        help
          This library provides pseudo-random number generator functions.
 
+config USE_TINY_MEMSET
+       bool "Use a size-optimized memset()"
+       help
+         This makes memset prefer code size over speed optimizations.
+         The fastest memset() is the arch-specific one (if available) enabled
+         by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get
+         better performance by writing a word at a time at the cost of
+         slightly bigger memset code, but in some special cases size might
+         be more important than speed.
+
+config SPL_USE_TINY_MEMSET
+       bool "Use a size-optimized memset()"
+       help
+         This makes memset prefer code size over speed optimizations.
+         The fastest memset() is the arch-specific one (if available) enabled
+         by CONFIG_USE_ARCH_MEMSET. If that is not enabled, we can still get
+         better performance by writing a word at a time at the cost of
+         slightly bigger memset code, but in some special cases size might
+         be more important than speed.
+
 source lib/dhry/Kconfig
 
 source lib/rsa/Kconfig
diff --git a/lib/string.c b/lib/string.c
index 67d5f6a421..edae997fa6 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -437,8 +437,10 @@ char *strswab(const char *s)
 void * memset(void * s,int c,size_t count)
 {
        unsigned long *sl = (unsigned long *) s;
-       unsigned long cl = 0;
        char *s8;
+
+#if !CONFIG_IS_ENABLED(USE_TINY_MEMSET)
+       unsigned long cl = 0;
        int i;
 
        /* do it one word at a time (32 bits or 64 bits) while possible */
@@ -452,6 +454,7 @@ void * memset(void * s,int c,size_t count)
                        count -= sizeof(*sl);
                }
        }
+#endif
        /* fill 8 bits at a time */
        s8 = (char *)sl;
        while (count--)
-- 
2.11.0


_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
https://lists.denx.de/listinfo/u-boot

Reply via email to