Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores

Ralph Castain Tue, 17 Jun 2014 00:03:44 -0400 (EDT)

No, that isn't correct. It should be:

> mpirun -np 32 --bycore  --bind-to-core 
> ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1



Again, there is no guarantee this will improve performance - the options that 
affect performance for a given application are highly application-specific


On Jun 16, 2014, at 8:23 PM, Yuping Sun <yupingpaula...@att.net> wrote:

> Hi Ralph:
> 
> Is the following correct command to you:
> 
> mpirun -np 32 --bysocket --bycore  
> ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1 
> 
> I run above command, still do not improve. Would you give me a detailed 
> command with options?
> Thank you.
> 
> Best regards,
> 
> Yuping
> 
> 
> --------------------------------------------
> On Tue, 6/17/14, Ralph Castain <r...@open-mpi.org> wrote:
> 
> Subject: Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores
> To: "Yuping Sun" <yupingpaula...@att.net>, "Open MPI Users" 
> <us...@open-mpi.org>
> Date: Tuesday, June 17, 2014, 1:59 AM
> 
> Well, for one, there
> is never any guarantee of linear scaling with the number of
> procs - that is very application dependent. You can actually
> see performance decrease with number of procs if the
> application doesn't know how to exploit them.
> One thing that stands out is your mapping and
> binding options. Mapping bysocket means that you are putting
> neighboring ranks (i.e., ranks that differ by 1) on
> different sockets, which usually means different NUMA
> regions. This make shared memory between those procs run
> poorly. IF the application does a lot of messaging between
> ranks that differ by 1, then you would see poor
> scaling.
> So one thing you could do is change --bysocket to
> --bycore. Then, if your application isn't threaded, you
> could --bind-to-core for better performance.
> 
> On Jun 16, 2014, at 3:19 PM, Yuping Sun <yupingpaula...@att.net>
> wrote:
> Dear All:
> I
> bought a 64 core workstation and installed NASA fun3d with
> open mpi 1.6.5. Then I started to test run fun3d using 16,
> 32, 48 cores. However the performance of the fun3d run is
> bad. I got data below:
> the
> run command is (it is for 32 core as an example)
> mpiexec
> -np 32 --bysocket --bind-to-socket
> ~ysun/Codes/NASA/fun3d-12.3-66687/Mpi/FUN3D_90/nodet_mpi
> --time_timestep_loop --animation_freq -1 >
> screen.dump_bs30
> 
> CPUs
>     times   
> iterations    time/it
> 60   
> 678s    30it       
> 22.61s
> 48   
> 702s    30it       
> 23.40s
> 32   
> 734s    30it       
> 24.50s
> 16   
> 894s    30it       
> 29.80s
> You
> can see using 60 cores, to run 30 iteration, FUN3D will
> complete in 678 seconds, roughly 22.61 second per
> iteration.
> Using
> 16 cores, to run 30 iteration, FUN3D will complete in 894
> seconds, roughly 29.8 seconds per iteration.
> the
> data above shows FUN3D run using mpirun does not scale at
> all! I used to run fun3d with mpirun on a 8 core WS, and it
> scales well.The
> same job to run on a linux cluster scales well.
> Would
> you all give me some advice to improve the performance loss
> when I
>  increase the use of more cores, or how to run mpirun with
> proper options to get a linear scaling when using 16 to 32
> to 48 cores?
> Thank
> you.
> Yuping
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription:
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/06/24654.php
>

Re: [OMPI users] how to get mpirun to scale from 16 to 64 cores

Reply via email to