At 15:59 08/05/2012, you wrote:
Yep you are correct. I did the same and it worked. When I have more than 3 MPI tasks there is lot of overhead on GPU.

But for CPU there is not overhead. All three machines have 4 quad core processors with 3.8 GB RAM.

Just wondering why there is no degradation of performance on CPU ?

Your GPU is saturated. It has more work than it can handle so its performance drops.

If your kernel code is the one you posted some days ago you can divide the number of threads and multiply the work done in each one, so you do the same work (maybe faster) without using/wasting all the thread pool and sm bandwith.


HTH





Reply via email to