Niels,

After a closer review of the code, I found that unaligned copy were a
lot slower them aligned 1s. Ive created an other version of the routine
that will take take of that. Attached to this email, you will find a
simple program that I used to test this code. This program will test
both aligned and unaligned (src & dst) of the 3 diff implementation
(libc memcpy, rev1 armasm memcpy, and rev2 armasm memcpy).

Here is the output of the program running on an arm9 AT91RM9200 using
uClibc-0.9.30 and gcc-4.2.4:
armasm is rev1, and armasm2 is rev2

# ./memtest 500000  
32bit src/dst Aligned test:
Testing libc (0x4005a008 <==> 0x40243008 : 500000):
2.996949 sec
Testing armasm (0x4005a008 <==> 0x40243008 : 500000):
1.331787 sec
Testing armasm2 (0x4005a008 <==> 0x40243008 : 500000):
1.358246 sec
The faster routine is armasm

16bit src/dst Aligned test:
Testing libc (0x4005a00a <==> 0x4024300a : 500000):
2.983215 sec
Testing armasm (0x4005a00a <==> 0x4024300a : 500000):
1.332214 sec
Testing armasm2 (0x4005a00a <==> 0x4024300a : 500000):
1.358978 sec
The faster routine is armasm

8bit src/dst Aligned test:
Testing libc (0x4005a009 <==> 0x40243009 : 500000):
2.982209 sec
Testing armasm (0x4005a009 <==> 0x40243009 : 500000):
1.331054 sec
Testing armasm2 (0x4005a009 <==> 0x40243009 : 500000):
1.359162 sec
The faster routine is armasm

16bit src Aligned test:
Testing libc (0x4005a00a <==> 0x40243008 : 500000):
2.983734 sec
Testing armasm (0x4005a00a <==> 0x40243008 : 500000):
2.571228 sec
Testing armasm2 (0x4005a00a <==> 0x40243008 : 500000):
1.419556 sec
The faster routine is armasm2

8bit src Aligned test:
Testing libc (0x4005a009 <==> 0x40243008 : 500000):
2.984101 sec
Testing armasm (0x4005a009 <==> 0x40243008 : 500000):
2.570343 sec
Testing armasm2 (0x4005a009 <==> 0x40243008 : 500000):
1.419525 sec
The faster routine is armasm2

16bit dst Aligned test:
Testing libc (0x4005a008 <==> 0x4024300a : 500000):
2.983948 sec
Testing armasm (0x4005a008 <==> 0x4024300a : 500000):
2.571563 sec
Testing armasm2 (0x4005a008 <==> 0x4024300a : 500000):
1.418671 sec
The faster routine is armasm2

8bit dst Aligned test:
Testing libc (0x4005a008 <==> 0x40243009 : 500000):
2.983521 sec
Testing armasm (0x4005a008 <==> 0x40243009 : 500000):
2.571258 sec
Testing armasm2 (0x4005a008 <==> 0x40243009 : 500000):
1.418762 sec
The faster routine is armasm2


As you can see, rev2 works a lot better with unaligned buffers. I will
update the patch to DirectFB to include this new version of the routine.


As for the big-endian, this version will ONLY work with little-endian,
so a config directive will need to be set for the build to work on those
targets. I will include that in the patch.

For now, it would be great if I could get some metrics from people to
double check my result.

Regards,

Vince




On Mon, 2009-03-23 at 16:36 +0100, Niels Roest wrote:
> Hi Vince,
> I'm happy to include the patch,
> I just have a few unclarities, hope somebody can clear them..
> 
> (1) memcpy is speed tested with (I think) aligned accesses (based on 
> D_MALLOC adresses) but I think we'll see a lot of unaligned memcpy's 
> too, but that side of the implementation looks kinda weak.. Anyone care 
> to give some figures for unaligned copy? Have a look at 
> direct_find_best_memcpy() in lib/direct/memcpy.c, and fidget a bit with 
> buf1 and buf2.
> (2) what happens on a big-endian ARM if I just include the patch? Having 
> trouble finding this dependancy in the patch.. Will need to fix this, or 
> put a show stopper somewhere for big-endian, so the patch doesn't break 
> something.
> 
> Greets
> Niels
> 
> vince wrote:
> > Hello,
> >
> > Ive been working on trying to improve the performance of directfb 1.3.0
> > on the arm platform. The attached patch will replace the default libc
> > memcpy with a faster implementation. Ive tested this patch using an
> > AT91RM9200, but should work on other ARM targets.
> >
> > Hope this will be useful to others.
> >
> > Regards,
> >
> > Vince 
> >   
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > directfb-dev mailing list
> > directfb-dev@directfb.org
> > http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev
> 
> 

Attachment: memtest.tar.bz2
Description: application/bzip-compressed-tar

_______________________________________________
directfb-dev mailing list
directfb-dev@directfb.org
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev

Reply via email to