On May 25 2002, Andrew Patrikalakis wrote:
> Hello all,

        Hi, Andrew.

> With all the recent talk of use of assembly on the PowerPC, I came
> up with a patch to use assembly versions of memcpy. It's about 35%
> faster. Here is a sample of the memcpy speed test (which also now
> works):

        Thanks for helping get xine get better for PowerPC.

        And also thanks for looking into my earlier message and
        discovering which part of the code was giving the relocation
        error (I just read your patch).

> Benchmarking memcpy methods (smaller is better):
>       glibc memcpy() : 136
>       ppcasm_memcpy() : 137
>       ppcasm_cacheable_memcpy() : 88
> xine: using ppcasm_cacheable_memcpy()
> (The lower time resolution is because I'm using times(NULL) in rdtsc())

        We can also look into getting an extra version of memcpy that
        makes the transfers with floating point registers as some
        people suggested on the Debian PowerPC mailing list.

        People there said that using floating point registers (which
        are 64 bits large) instead of general purpose registers (32
        bits each) may improve things.

> I'd like to know how much it helps PPC users, so keep this list up
> to date with the results. (Also, if my patch breaks other
> platforms...) It gave my Mac laptop the little boost it needed to
> play some media I have.

        Well, with the faster memcpy and with XFree86 4.2.0 (with DMA
        enabled), I can watch a DVD here with linearblend
        deinterlacing (coded in C) enabled and there are about 15% of
        frames skipped, which while still not perfect, is quite an
        improvement in face of the situation some weeks ago.

        BTW, I am using gcc-3.0 to compile xine-libs and I added some
        extra options to the configure script (-mfused-madd,
        -mcpu=750, -mtune=750, -O9).

        The next points of improvement (which may not be as immediate
        as using the memcpy being discussion) may be coding the idct,
        motion compensation and deinterlacing in assembly also.

        I guess that I'll heave to learn a bit more before I can get
        to these, but with the help of other people, things could go
        faster.

> Just so you know, the methods I used are from the linux kernel
> version 2.4.18 (arch/ppc/lib/string.S)

        Yes, that's what I tried in my earlier message, but I wasn't
        as succesful as you were.

        Your patch had a problem, though and I had to apply a part of
        it by hand. You might perhaps want to remake it and send to
        the xine developers so that it can be included for xine
        release 0.9.10, which should be near.

> Andrew Patrikalakis


        Thanks for your help, Roger...

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
  Rogério Brito - [EMAIL PROTECTED] - http://www.ime.usp.br/~rbrito/
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to