Hello Niels! Thanks so much for your reply!
I've further narrowed the scope of this issue as follows: Scenario A: 1. Create two local malloc'ed buffers 2. Run memcpy test copying between buffers 3. Relative performance level: 100% Scenario B: 1. Create two local malloc'ed buffers 2. Create DirectFB surfaces from these buffers (but don't do any other DirectFB operations) 3. Run memcpy test copying between buffers 4. Relative performance level: 100% Scenario C: 1. Create two local malloc'ed buffers 2. Create DirectFB surfaces from these buffers 3. Perform write operations on surface buffers while locked 4. Run memcpy test copying between buffers 5. Relative performance level: 25-50% Essentially, the end result is that once the surface memory has been written too while locked, any future memory operations on that area of memory are much slower, almost as if cacheing in that area of memory was disabled. Thanks, -Robert > From: Niels Roest <[EMAIL PROTECTED]> > > Hi Robert, > sounds odd indeed. > > Can you provide the output of dfbdump -a -p ? > This should output the pool allocation status, as well as out of which > pool your surfaces are coming. > Unfortunately, this is only interesting while your test app is running, > which means to either build DirectFB with --enable-multi and using > fusion.ko, or calling dfbdump statically from your app. > > Alternatively, you can do debug=Core/SurfacePool (and maybe > debug=Core/SurfPoolLock) if you have debug enabled, this will also > output this info more or less. > > Greets > Niels > > Robert Hildinger wrote: >> I'm seeing a curious issue where surface memory seems to be slower than >> freshly malloc'ed memory. The platform I am working on is based on an >> ARM1176 CPU (without L2 cache) and no graphics acceleration - everything is >> software based. I am currently testing DirectFB 1.1.1. >> >> I created a simple straight blitter with no blending (i.e. A memcpy blitter) >> to see if I could increase blitting performance using some special ARM >> instructions. What I found when doing this was that if I created two >> surfaces with preallocated memory, and then ran my blitter to copy between >> the two surfaces, the performance was roughly 38 megapixels/sec, roughly 4 >> times lower than the available memory bandwidth. If I ran the exact same >> code without first creating a surface out of the preallocated buffers, the >> performance jumped to around 128 megapixels/sec - much closer to the memory >> bandwith max. >> >> My question is, what is DirectFB doing to the preallocated memory buffers >> that is causing them memcpy's between them to slow down so drastically? Are >> the buffers being mmap'ed internally? Is there any way to improve this >> situation? >> >> Thanks! >> -Robert Hildinger _______________________________________________ directfb-dev mailing list directfb-dev@directfb.org http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev