Well, for one, there is never any guarantee of linear scaling with the number 
of procs - that is very application dependent. You can actually see performance 
decrease with number of procs if the application doesn't know how to exploit 
them.

One thing that stands out is your mapping and binding options. Mapping bysocket 
means that you are putting neighboring ranks (i.e., ranks that differ by 1) on 
different sockets, which usually means different NUMA regions. This make shared 
memory between those procs run poorly. IF the application does a lot of 
messaging between ranks that differ by 1, then you would see poor scaling.

So one thing you could do is change --bysocket to --bycore. Then, if your 
application isn't threaded, you could --bind-to-core for better performance.


On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaula...@att.net> wrote:

> Dear All:
> 
> I bought a 64 core workstation and installed NASA fun3d with open mpi 1.6.5. 
> Then I started to test run fun3d using 16, 32, 48 cores. However the 
> performance of the fun3d run is bad. I got data below:
> 
> the run command is (it is for 32 core as an example)
> mpiexec -np 32 --bysocket --bind-to-socket 
> ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi --time_timestep_loop 
> --animation_freq -1 > screen.dump_bs30
> 
> CPUs     times    iterations    time/it
> 60    678s    30it        22.61s
> 48    702s    30it        23.40s
> 32    734s    30it        24.50s
> 16    894s    30it        29.80s
> 
> You can see using 60 cores, to run 30 iteration, FUN3D will complete in 678 
> seconds, roughly 22.61 second per iteration.
> 
> Using 16 cores, to run 30 iteration, FUN3D will complete in 894 seconds, 
> roughly 29.8 seconds per iteration.
> 
> the data above shows FUN3D run using mpirun does not scale at all! I used to 
> run fun3d with mpirun on a 8 core WS, and it scales well.
> The same job to run on a linux cluster scales well.
> 
> Would you all give me some advice to improve the performance loss when I 
> increase the use of more cores, or how to run mpirun with proper options to 
> get a linear scaling when using 16 to 32 to 48 cores?
> 
> Thank you.
> 
> Yuping
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/06/24654.php

Reply via email to