Re: [OMPI users] Working with a CellBlade cluster

Gilbert Grosdidier Wed, 29 Oct 2008 12:36:45 -0400

Thank you very much Mi and Lenny for your detailed replies.

 I believe I can summarize the infos to allow for 
'Working with a QS22 CellBlade cluster' like this:
- Yes, messages are efficiently handled with "-mca btl openib,sm,self"
- Better to go to the OMPI-1.3 version ASAP
- It is currently more efficient/easy to use numactl to control
processor affinity on a QS22.


 So far so good.

 One question remains: how could I monitor in details message passing
thru IB (on one side) and thru SM (on the other side) thru the use of mca 
parameters, please ? Additionnal info about the verbosity level
of this monitoring will be highly appreciated ... A lengthy travel
inside the list of such parameters provided by ompi_info did not
enlighten me (there are so many xxx_sm_yyy type params that I don't know which 
could be the right one ;-)

 Thanks in advance for your hints,      Best Regards,     Gilbert.


On Thu, 23 Oct 2008, Mi Yan wrote:

> 
> 1.  MCA BTL parameters
> With "-mca btl openib,self", both message between two Cell processors on
> one QS22 and   messages between two QS22s go through IB.
> 
> With "-mca btl openib,sm,slef",  message on one QS22 go through shared
> memory,  message between QS22 go through IB,
> 
> Depending on the message size and other MCA parameters,  it does not
> guarantee message passing on shared memory is faster than on IB.   E.g.
> the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
> on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
> The bandwidth of 4MB message on shared memory may be higher if you tune
> some MCA parameter.
> 
> 2.  mpi_paffinity_alone
>   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
> sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
> SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
> kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2 to
> socket 2.    If mpi_paffinity_alone is set to 1,   the two MPI instances
> will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this is
> not what you want.
> 
>     A temporaily solution to  force the affinity on  QS22 is to use
> "numactl",   E.g.  assuming the hostname is "qs22" and the executable is
> "foo".  the following command can be used
>                 mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H qs22
> numactl -c1 -m1 foo
> 
>    In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
> use  PLPA to force the processor affinity.
> 
> Best Regards,
> Mi
> 
> 
> 
>                                                                        
>              "Lenny                                                    
>              Verkhovsky"                                               
>              <lenny.verkhovsky                                          To
>              @gmail.com>               "Open MPI Users"                
>              Sent by:                  <us...@open-mpi.org>            
>              users-bounces@ope                                          cc
>              n-mpi.org                                                 
>                                                                    Subject
>                                        Re: [OMPI users] Working with a 
>              10/23/2008 05:48          CellBlade cluster               
>              AM                                                        
>                                                                        
>                                                                        
>              Please respond to                                         
>               Open MPI Users                                           
>              <users@open-mpi.o                                         
>                     rg>                                                
>                                                                        
>                                                                        
> 
> 
> 
> 
> Hi,
> 
> 
> If I understand you correctly the most suitable way to do it is by
> paffinity that we have in Open MPI 1.3 and the trank.
> how ever usually OS is distributing processes evenly between sockets by it
> self.
> 
> There still no formal FAQ due to a multiple reasons but you can read how to
> use it in the attached scratch ( there were few name changings of the
> params, so check with ompi_info )
> 
> shared memory is used between processes that share same machine, and openib
> is used between different machines ( hostnames ), no special mca params are
> needed.
> 
> Best Regards
> Lenny,
> 
> 
> 
> 
> 
> 
> 
> On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier <gro...@mail.cern.ch>
> wrote:
>    Working with a CellBlade cluster (QS22), the requirement is to have one
>   instance of the executable running on each socket of the blade (there are
>   2
>   sockets). The application is of the 'domain decomposition' type, and each
>   instance is required to often send/receive data with both the remote
>   blades and
>   the neighbor socket.
> 
>    Question is : which specification must be used for the mca btl component
>   to force 1) shmem type messages when communicating with this neighbor
>   socket,
>   while 2) using openib to communicate with the remote blades ?
>   Is '-mca btl sm,openib,self' suitable for this ?
> 
>    Also, which debug flags could be used to crosscheck that the messages
>   are
>   _actually_ going thru the right channel for a given channel, please ?
> 
>    We are currently using OpenMPI 1.2.5 shipped with RHEL5.2 (ppc64).
>   Which version do you think is currently the most optimised for these
>   processors and problem type ? Should we go towards OpenMPI 1.2.8
>   instead ?
>   Or even try some OpenMPI 1.3 nightly build ?
> 
>    Thanks in advance for your help,                  Gilbert.
> 
>   _______________________________________________
>   users mailing list
>   us...@open-mpi.org
>   http://www.open-mpi.org/mailman/listinfo.cgi/users
> (See attached file: RANKS_FAQ.doc)
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
*---------------------------------------------------------------------*
  Gilbert Grosdidier                 gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
 ---------------------------------------------------------------------

Re: [OMPI users] Working with a CellBlade cluster

Reply via email to