Hi Yuping, Maybe using multi-threads inside a socket, and MPI among sockets is better choice for such NUMA platform.
Multi-threads can exploit the benefit of share memory, and MPI can alleviate the cost of non-uniform memory access. regards, Zehan On Tue, Jun 17, 2014 at 6:19 AM, Yuping Sun <yupingpaula...@att.net> wrote: > Dear All: > > I bought a 64 core workstation and installed NASA fun3d with open mpi > 1.6.5. Then I started to test run fun3d using 16, 32, 48 cores. However the > performance of the fun3d run is bad. I got data below: > > the run command is (it is for 32 core as an example) > mpiexec -np 32 --bysocket --bind-to-socket > ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi > --time_timestep_loop --animation_freq -1 > screen.dump_bs30 > > CPUs times iterations time/it > 60 678s 30it 22.61s > 48 702s 30it 23.40s > 32 734s 30it 24.50s > 16 894s 30it 29.80s > > You can see using 60 cores, to run 30 iteration, FUN3D will complete in > 678 seconds, roughly 22.61 second per iteration. > > Using 16 cores, to run 30 iteration, FUN3D will complete in 894 seconds, > roughly 29.8 seconds per iteration. > > the data above shows FUN3D run using mpirun does not scale at all! I used > to run fun3d with mpirun on a 8 core WS, and it scales well. > The same job to run on a linux cluster scales well. > > Would you all give me some advice to improve the performance loss when I > increase the use of more cores, or how to run mpirun with proper options to > get a linear scaling when using 16 to 32 to 48 cores? > > Thank you. > > Yuping > > > > > > > > > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24654.php >