On 20 May, this message from Rogério Brito echoed through cyberspace: > Anyway, one of the first things I see is that xine uses a > function called xine_fast_memcpy, which is an alternative > memcpy function possibly written in assembly (if available) or > the standard glibc, if no other version is available, as is > the case with PPC. > > I saw that the Linux kernel has an assembly implementation of > memcopy and decided to try that instead of the glibc version.
Hmm.. kernel.... > After just a few adaptations and removals of unnecessary > functions, I ended up with a string.S file with only > cacheable_memcpy and memcpy, which seem to be the important > parts of the file for my purposes. > > According to my tests, cacheable_memcpy is approximately 40% > faster than the original glibc version, which is quite an > improvement: with my tests, the glibc version took approx. 69s > to run, while the cacheable_memcpy took only 42s (repeated > many times to avoid noise errors). You may want to try to use floating point load & stores in FPR registers; that is typically faster than integer load/stores. The performance gain may depend on the cacheability of the source/destination memory, though, but it's definitely worth a try. The kernel btw can't use that since floating point is a big no-no inside kernel code. Cheers Michel ------------------------------------------------------------------------- Michel Lanners | " Read Philosophy. Study Art. 23, Rue Paul Henkes | Ask Questions. Make Mistakes. L-1710 Luxembourg | email [EMAIL PROTECTED] | http://www.cpu.lu/~mlan | Learn Always. " -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]