Hi Tom,

On 06.08.21 16:24, Tom Rini wrote:
On Fri, Aug 06, 2021 at 03:38:38PM +0200, Stefan Roese wrote:


On an NXP LX2160 based platform it has been noticed, that the currently
implemented memset/memcpy functions for aarch64 are suboptimal.
Especially the memset() for clearing the NXP MC firmware memory is very
expensive (time-wise).

By using optimized functions, a speedup of ~ factor 6 has been measured.

This patchset now adds the optimized functions ported from this
repository:
https://github.com/ARM-software/optimized-routines

As these functions make use of opcodes that need the caches to be
enabled, they can't be used in the very early boot stage, before the
caches are enabled. Because of this, a simple memset() version is also
added, in this case memset_simple(), and will be used in very few
selected places.

Please note that checkpatch.pl complains about some issue with this
imported file: arch/arm/lib/asmdefs.h
Since it's imported I did explicitly not make any changes here, to make
potential future sync'ing easer.

Traditionally, we grab the linux kernel's optimized functions.  Are
there not a set to grab there?

Yes, there are and I did this also. Here my comment from the commit log
of patch 4/5:

Note:
I also integrated and tested with the Linux versions of these optimized
functions. They are similar to the ones now integrated but these ARM
versions are still a small bit faster.

Thanks,
Stefan

Reply via email to