Re: [OMPI users] Working with a CellBlade cluster

Gilbert Grosdidier Fri, 31 Oct 2008 16:59:51 -0400

OK, thanks to Mi and Jeff for their useful replies anyway.

 Gilbert.


On Fri, 31 Oct 2008, Jeff Squyres wrote:

> AFAIK, there are no parameters available to monitor IB message passing.  The
> majority of it is processed in hardware, and Linux is unaware of it.  We have
> not added any extra instrumentation into the openib BTL to provide auditing
> information, because, among other reasons, that is the performance-critical
> code path and we didn't want to add any latency in there.
> 
> The best you may be able to do is with a PMPI-based library to audit MPI
> function call invocations.
> 
> 
> On Oct 31, 2008, at 4:07 PM, Mi Yan wrote:
> 
> > Gilbert,
> > 
> > I did not know the MCA parameters that can monitor the message passing. I
> > have tried a few MCA verbose parameters and did not identify anyone helpful.
> > 
> > One way to check if the message goes via IB or SM maybe to check the
> > counters in /sys/class/infiniband.
> > 
> > Regards,
> > Mi
> > <graycol.gif>Gilbert Grosdidier <gro...@mail.cern.ch>
> > 
> > 
> > Gilbert Grosdidier <gro...@mail.cern.ch>
> > Sent by: users-boun...@open-mpi.org
> > 10/29/2008 12:36 PM
> > Please respond to
> > Open MPI Users <us...@open-mpi.org>
> > <ecblank.gif>
> > To
> > <ecblank.gif>
> > Open MPI Users <us...@open-mpi.org>
> > <ecblank.gif>
> > cc
> > <ecblank.gif>
> > <ecblank.gif>
> > Subject
> > <ecblank.gif>
> > Re: [OMPI users] Working with a CellBlade cluster
> > <ecblank.gif>
> > <ecblank.gif>
> > 
> > Thank you very much Mi and Lenny for your detailed replies.
> > 
> > I believe I can summarize the infos to allow for
> > 'Working with a QS22 CellBlade cluster' like this:
> > - Yes, messages are efficiently handled with "-mca btl openib,sm,self"
> > - Better to go to the OMPI-1.3 version ASAP
> > - It is currently more efficient/easy to use numactl to control
> > processor affinity on a QS22.
> > 
> > So far so good.
> > 
> > One question remains: how could I monitor in details message passing
> > thru IB (on one side) and thru SM (on the other side) thru the use of mca
> > parameters, please ? Additionnal info about the verbosity level
> > of this monitoring will be highly appreciated ... A lengthy travel
> > inside the list of such parameters provided by ompi_info did not
> > enlighten me (there are so many xxx_sm_yyy type params that I don't know
> > which
> > could be the right one ;-)
> > 
> > Thanks in advance for your hints,      Best Regards,     Gilbert.
> > 
> > 
> > On Thu, 23 Oct 2008, Mi Yan wrote:
> > 
> > >
> > > 1.  MCA BTL parameters
> > > With "-mca btl openib,self", both message between two Cell processors on
> > > one QS22 and   messages between two QS22s go through IB.
> > >
> > > With "-mca btl openib,sm,slef",  message on one QS22 go through shared
> > > memory,  message between QS22 go through IB,
> > >
> > > Depending on the message size and other MCA parameters,  it does not
> > > guarantee message passing on shared memory is faster than on IB.   E.g.
> > > the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s
> > > on IB;  the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on  IB.
> > > The bandwidth of 4MB message on shared memory may be higher if you tune
> > > some MCA parameter.
> > >
> > > 2.  mpi_paffinity_alone
> > >   "mpi_paffinity_alone =1"  is not a good choice for QS22.  There are two
> > > sockets with two physical  Cell/B.E. on one QS22.  Each Cell/B.E. has two
> > > SMT threads.   So there are four logical CPUs on one QS22.  CBE Linux
> > > kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2 to
> > > socket 2.    If mpi_paffinity_alone is set to 1,   the two MPI instances
> > > will be assigned to logical cpu 0 and cpu 1 on socket 1.  I believe this
> > is
> > > not what you want.
> > >
> > >     A temporaily solution to  force the affinity on  QS22 is to use
> > > "numactl",   E.g.  assuming the hostname is "qs22" and the executable is
> > > "foo".  the following command can be used
> > >                 mpirun -np 1 -H qs22 numactl -c0 -m0  foo :   -np 1 -H
> > qs22
> > > numactl -c1 -m1 foo
> > >
> > >    In the long run,  I wish CBE kernel export  CPU topology  in /sys  and
> > > use  PLPA to force the processor affinity.
> > >
> > > Best Regards,
> > > Mi
> > >
> > >
> > >
> > >
> > >              "Lenny
> > >              Verkhovsky"
> > >              <lenny.verkhovsky                                          To
> > >              @gmail.com>               "Open MPI Users"
> > >              Sent by:                  <us...@open-mpi.org>
> > >              users-bounces@ope                                          cc
> > >              n-mpi.org
> > >                                                                    Subject
> > >                                        Re: [OMPI users] Working with a
> > >              10/23/2008 05:48          CellBlade cluster
> > >              AM
> > >
> > >
> > >              Please respond to
> > >               Open MPI Users
> > >              <users@open-mpi.o
> > >                     rg>
> > >
> > >
> > >
> > >
> > >
> > >
> > > Hi,
> > >
> > >
> > > If I understand you correctly the most suitable way to do it is by
> > > paffinity that we have in Open MPI 1.3 and the trank.
> > > how ever usually OS is distributing processes evenly between sockets by it
> > > self.
> > >
> > > There still no formal FAQ due to a multiple reasons but you can read how
> > to
> > > use it in the attached scratch ( there were few name changings of the
> > > params, so check with ompi_info )
> > >
> > > shared memory is used between processes that share same machine, and
> > openib
> > > is used between different machines ( hostnames ), no special mca params
> > are
> > > needed.
> > >
> > > Best Regards
> > > Lenny,
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier <gro...@mail.cern.ch>
> > > wrote:
> > >    Working with a CellBlade cluster (QS22), the requirement is to have one
> > >   instance of the executable running on each socket of the blade (there
> > are
> > >   2
> > >   sockets). The application is of the 'domain decomposition' type, and
> > each
> > >   instance is required to often send/receive data with both the remote
> > >   blades and
> > >   the neighbor socket.
> > >
> > >    Question is : which specification must be used for the mca btl
> > component
> > >   to force 1) shmem type messages when communicating with this neighbor
> > >   socket,
> > >   while 2) using openib to communicate with the remote blades ?
> > >   Is '-mca btl sm,openib,self' suitable for this ?
> > >
> > >    Also, which debug flags could be used to crosscheck that the messages
> > >   are
> > >   _actually_ going thru the right channel for a given channel, please ?
> > >
> > >    We are currently using OpenMPI 1.2.5 shipped with RHEL5.2 (ppc64).
> > >   Which version do you think is currently the most optimised for these
> > >   processors and problem type ? Should we go towards OpenMPI 1.2.8
> > >   instead ?
> > >   Or even try some OpenMPI 1.3 nightly build ?
> > >
> > >    Thanks in advance for your help,                  Gilbert.
> > >
> > >   _______________________________________________
> > >   users mailing list
> > >   us...@open-mpi.org
> > >   http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > (See attached file: RANKS_FAQ.doc)
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > -- 
> > *---------------------------------------------------------------------*
> >  Gilbert Grosdidier                 gilbert.grosdid...@in2p3.fr
> >  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
> >  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
> >  B.P. 34, F-91898 Orsay Cedex (FRANCE)
> > ---------------------------------------------------------------------
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 

-- 
*---------------------------------------------------------------------*
  Gilbert Grosdidier                 gilbert.grosdid...@in2p3.fr
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)
 ---------------------------------------------------------------------

Re: [OMPI users] Working with a CellBlade cluster

Reply via email to