No, that isn't correct. It should be: > mpirun -np 32 --bycore --bind-to-core > ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi > --time_timestep_loop --animation_freq -1
Again, there is no guarantee this will improve performance - the options that affect performance for a given application are highly application-specific On Jun 16, 2014, at 8:23 PM, Yuping Sun <yupingpaula...@att.net> wrote: > Hi Ralph: > > Is the following correct command to you: > > mpirun -np 32 --bysocket --bycore > ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi > --time_timestep_loop --animation_freq -1 > > I run above command, still do not improve. Would you give me a detailed > command with options? > Thank you. > > Best regards, > > Yuping > > > -------------------------------------------- > On Tue, 6/17/14, Ralph Castain <r...@open-mpi.org> wrote: > > Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores > To: "Yuping Sun" <yupingpaula...@att.net>, "Open MPI Users" > <us...@open-mpi.org> > Date: Tuesday, June 17, 2014, 1:59 AM > > Well, for one, there > is never any guarantee of linear scaling with the number of > procs - that is very application dependent. You can actually > see performance decrease with number of procs if the > application doesn't know how to exploit them. > One thing that stands out is your mapping and > binding options. Mapping bysocket means that you are putting > neighboring ranks (i.e., ranks that differ by 1) on > different sockets, which usually means different NUMA > regions. This make shared memory between those procs run > poorly. IF the application does a lot of messaging between > ranks that differ by 1, then you would see poor > scaling. > So one thing you could do is change --bysocket to > --bycore. Then, if your application isn't threaded, you > could --bind-to-core for better performance. > > On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaula...@att.net> > wrote: > Dear All: > I > bought a 64 core workstation and installed NASA fun3d with > open mpi 1.6.5. Then I started to test run fun3d using 16, > 32, 48 cores. However the performance of the fun3d run is > bad. I got data below: > the > run command is (it is for 32 core as an example) > mpiexec > -np 32 --bysocket --bind-to-socket > ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi > --time_timestep_loop --animation_freq -1 > > screen.dump_bs30 > > CPUs > times > iterations time/it > 60 > 678s 30it > 22.61s > 48 > 702s 30it > 23.40s > 32 > 734s 30it > 24.50s > 16 > 894s 30it > 29.80s > You > can see using 60 cores, to run 30 iteration, FUN3D will > complete in 678 seconds, roughly 22.61 second per > iteration. > Using > 16 cores, to run 30 iteration, FUN3D will complete in 894 > seconds, roughly 29.8 seconds per iteration. > the > data above shows FUN3D run using mpirun does not scale at > all! I used to run fun3d with mpirun on a 8 core WS, and it > scales well.The > same job to run on a linux cluster scales well. > Would > you all give me some advice to improve the performance loss > when I > increase the use of more cores, or how to run mpirun with > proper options to get a linear scaling when using 16 to 32 > to 48 cores? > Thank > you. > Yuping > > > > > > > > > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: > http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/06/24654.php >