OK, thanks to Mi and Jeff for their useful replies anyway. Gilbert.
On Fri, 31 Oct 2008, Jeff Squyres wrote: > AFAIK, there are no parameters available to monitor IB message passing. The > majority of it is processed in hardware, and Linux is unaware of it. We have > not added any extra instrumentation into the openib BTL to provide auditing > information, because, among other reasons, that is the performance-critical > code path and we didn't want to add any latency in there. > > The best you may be able to do is with a PMPI-based library to audit MPI > function call invocations. > > > On Oct 31, 2008, at 4:07 PM, Mi Yan wrote: > > > Gilbert, > > > > I did not know the MCA parameters that can monitor the message passing. I > > have tried a few MCA verbose parameters and did not identify anyone helpful. > > > > One way to check if the message goes via IB or SM maybe to check the > > counters in /sys/class/infiniband. > > > > Regards, > > Mi > > <graycol.gif>Gilbert Grosdidier <gro...@mail.cern.ch> > > > > > > Gilbert Grosdidier <gro...@mail.cern.ch> > > Sent by: users-boun...@open-mpi.org > > 10/29/2008 12:36 PM > > Please respond to > > Open MPI Users <us...@open-mpi.org> > > <ecblank.gif> > > To > > <ecblank.gif> > > Open MPI Users <us...@open-mpi.org> > > <ecblank.gif> > > cc > > <ecblank.gif> > > <ecblank.gif> > > Subject > > <ecblank.gif> > > Re: [OMPI users] Working with a CellBlade cluster > > <ecblank.gif> > > <ecblank.gif> > > > > Thank you very much Mi and Lenny for your detailed replies. > > > > I believe I can summarize the infos to allow for > > 'Working with a QS22 CellBlade cluster' like this: > > - Yes, messages are efficiently handled with "-mca btl openib,sm,self" > > - Better to go to the OMPI-1.3 version ASAP > > - It is currently more efficient/easy to use numactl to control > > processor affinity on a QS22. > > > > So far so good. > > > > One question remains: how could I monitor in details message passing > > thru IB (on one side) and thru SM (on the other side) thru the use of mca > > parameters, please ? Additionnal info about the verbosity level > > of this monitoring will be highly appreciated ... A lengthy travel > > inside the list of such parameters provided by ompi_info did not > > enlighten me (there are so many xxx_sm_yyy type params that I don't know > > which > > could be the right one ;-) > > > > Thanks in advance for your hints, Best Regards, Gilbert. > > > > > > On Thu, 23 Oct 2008, Mi Yan wrote: > > > > > > > > 1. MCA BTL parameters > > > With "-mca btl openib,self", both message between two Cell processors on > > > one QS22 and messages between two QS22s go through IB. > > > > > > With "-mca btl openib,sm,slef", message on one QS22 go through shared > > > memory, message between QS22 go through IB, > > > > > > Depending on the message size and other MCA parameters, it does not > > > guarantee message passing on shared memory is faster than on IB. E.g. > > > the bandwidth for 64KB message is 959MB/s on shared-memory and is 694MB/s > > > on IB; the bandwidth for 4MB message is 539 MB/s and 1092 MB/s on IB. > > > The bandwidth of 4MB message on shared memory may be higher if you tune > > > some MCA parameter. > > > > > > 2. mpi_paffinity_alone > > > "mpi_paffinity_alone =1" is not a good choice for QS22. There are two > > > sockets with two physical Cell/B.E. on one QS22. Each Cell/B.E. has two > > > SMT threads. So there are four logical CPUs on one QS22. CBE Linux > > > kernel maps logical cpu 0 and 1 to socket1 and maps logical cpu 1 and 2 to > > > socket 2. If mpi_paffinity_alone is set to 1, the two MPI instances > > > will be assigned to logical cpu 0 and cpu 1 on socket 1. I believe this > > is > > > not what you want. > > > > > > A temporaily solution to force the affinity on QS22 is to use > > > "numactl", E.g. assuming the hostname is "qs22" and the executable is > > > "foo". the following command can be used > > > mpirun -np 1 -H qs22 numactl -c0 -m0 foo : -np 1 -H > > qs22 > > > numactl -c1 -m1 foo > > > > > > In the long run, I wish CBE kernel export CPU topology in /sys and > > > use PLPA to force the processor affinity. > > > > > > Best Regards, > > > Mi > > > > > > > > > > > > > > > "Lenny > > > Verkhovsky" > > > <lenny.verkhovsky To > > > @gmail.com> "Open MPI Users" > > > Sent by: <us...@open-mpi.org> > > > users-bounces@ope cc > > > n-mpi.org > > > Subject > > > Re: [OMPI users] Working with a > > > 10/23/2008 05:48 CellBlade cluster > > > AM > > > > > > > > > Please respond to > > > Open MPI Users > > > <users@open-mpi.o > > > rg> > > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > If I understand you correctly the most suitable way to do it is by > > > paffinity that we have in Open MPI 1.3 and the trank. > > > how ever usually OS is distributing processes evenly between sockets by it > > > self. > > > > > > There still no formal FAQ due to a multiple reasons but you can read how > > to > > > use it in the attached scratch ( there were few name changings of the > > > params, so check with ompi_info ) > > > > > > shared memory is used between processes that share same machine, and > > openib > > > is used between different machines ( hostnames ), no special mca params > > are > > > needed. > > > > > > Best Regards > > > Lenny, > > > > > > > > > > > > > > > > > > > > > > > > On Sun, Oct 19, 2008 at 10:32 AM, Gilbert Grosdidier <gro...@mail.cern.ch> > > > wrote: > > > Working with a CellBlade cluster (QS22), the requirement is to have one > > > instance of the executable running on each socket of the blade (there > > are > > > 2 > > > sockets). The application is of the 'domain decomposition' type, and > > each > > > instance is required to often send/receive data with both the remote > > > blades and > > > the neighbor socket. > > > > > > Question is : which specification must be used for the mca btl > > component > > > to force 1) shmem type messages when communicating with this neighbor > > > socket, > > > while 2) using openib to communicate with the remote blades ? > > > Is '-mca btl sm,openib,self' suitable for this ? > > > > > > Also, which debug flags could be used to crosscheck that the messages > > > are > > > _actually_ going thru the right channel for a given channel, please ? > > > > > > We are currently using OpenMPI 1.2.5 shipped with RHEL5.2 (ppc64). > > > Which version do you think is currently the most optimised for these > > > processors and problem type ? Should we go towards OpenMPI 1.2.8 > > > instead ? > > > Or even try some OpenMPI 1.3 nightly build ? > > > > > > Thanks in advance for your help, Gilbert. > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > (See attached file: RANKS_FAQ.doc) > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > > *---------------------------------------------------------------------* > > Gilbert Grosdidier gilbert.grosdid...@in2p3.fr > > LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 > > Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 > > B.P. 34, F-91898 Orsay Cedex (FRANCE) > > --------------------------------------------------------------------- > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- *---------------------------------------------------------------------* Gilbert Grosdidier gilbert.grosdid...@in2p3.fr LAL / IN2P3 / CNRS Phone : +33 1 6446 8909 Faculté des Sciences, Bat. 200 Fax : +33 1 6446 8546 B.P. 34, F-91898 Orsay Cedex (FRANCE) ---------------------------------------------------------------------