On  20 May, this message from Rogério Brito echoed through cyberspace:
>       Anyway, one of the first things I see is that xine uses a
>       function called xine_fast_memcpy, which is an alternative
>       memcpy function possibly written in assembly (if available) or
>       the standard glibc, if no other version is available, as is
>       the case with PPC.
> 
>       I saw that the Linux kernel has an assembly implementation of
>       memcopy and decided to try that instead of the glibc version.

Hmm.. kernel....

>       After just a few adaptations and removals of unnecessary
>       functions, I ended up with a string.S file with only
>       cacheable_memcpy and memcpy, which seem to be the important
>       parts of the file for my purposes.
> 
>       According to my tests, cacheable_memcpy is approximately 40%
>       faster than the original glibc version, which is quite an
>       improvement: with my tests, the glibc version took approx. 69s
>       to run, while the cacheable_memcpy took only 42s (repeated
>       many times to avoid noise errors).

You may want to try to use floating point load & stores in FPR
registers; that is typically faster than integer load/stores. The
performance gain may depend on the cacheability of the
source/destination memory, though, but it's definitely worth a try.

The kernel btw can't use that since floating point is a big no-no inside
kernel code.

Cheers

Michel

-------------------------------------------------------------------------
Michel Lanners                 |  " Read Philosophy.  Study Art.
23, Rue Paul Henkes            |    Ask Questions.  Make Mistakes.
L-1710 Luxembourg              |
email   [EMAIL PROTECTED]            |
http://www.cpu.lu/~mlan        |                     Learn Always. "


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to