Re: [OMPI users] infiniband
Another nice tools for ib monitoring. 1. perfquery (part of OFED), example of report: Port counters: Lid 12 port 1 PortSelect:..1 CounterSelect:...0x SymbolErrors:7836 LinkRecovers:255 LinkDowned:..0 RcvErrors:...24058 RcvRemotePhysErrors:.6159 RcvSwRelayErrors:0 XmtDiscards:.3176 XmtConstraintErrors:.0 RcvConstraintErrors:.0 LinkIntegrityErrors:.0 ExcBufOverrunErrors:.0 VL15Dropped:.0 XmtData:.1930 RcvData:.1708 XmtPkts:.114 RcvPkts:.114 2. collectl - http://collectl.sourceforge.net/, example of report: #<---Memory--><--InfiniBand--> #cpu sys inter ctxsw free buff cach inac slab map KBin pktIn KBOut pktOut Errs 1 0 847 1273 1G 264M 3G 594M 1G 234M 2 29 2 29 123242 2 1 851 2578 1G 264M 3G 594M 1G 234M 1 5 1 5 123391 Pavel Shamis (Pasha) wrote: SLIM H.A. wrote: Is it possible to get information about the usage of hca ports similar to the result of the mx_endpoint_info command for Myrinet boards? The ibstat command gives information like this: Port 1: State: Active Physical state: LinkUp but does not say whether a job is actually using an infiniband port or comunicates through plain ethernet. I would be grateful for any advice You have access to some counters in /sys/class/infiniband/mlx4_0/ports/1/counters/ (counters for hca - mlx4_0 , port 1) -- Pavel Shamis (Pasha) Mellanox Technologies
[OMPI users] Enabling progress thread
In message http://www.open-mpi.org/community/lists/users/ 2007/03/2889.php I found this comment: "The only way to get any benefit from the MPI_Bsend is to have a progress thread which take care of the pending communications in the background. Such thread is not enabled by default in Open MPI." I understand this won't be portable, but how do you enable a sender progress thread in Open MPI?
Re: [OMPI users] Enabling progress thread
You can add --enable-progress-threads to the configure. However, please consider this as a beta feature. We know for sure there is some bugs in current thread safety. Aurelien Le 1 mai 08 à 09:46, Alberto Giannetti a écrit : In message http://www.open-mpi.org/community/lists/users/ 2007/03/2889.php I found this comment: "The only way to get any benefit from the MPI_Bsend is to have a progress thread which take care of the pending communications in the background. Such thread is not enabled by default in Open MPI." I understand this won't be portable, but how do you enable a sender progress thread in Open MPI? ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Install BLACS and ScaLAPACK on Leopard
Dear all, I'm new to openmpi. I'm now trying to use BLACS and ScaLAPACK on Leopard. Since it has built-in Open MPI, I didn't install any other versions. I followed the BLACS install guidances in FAQ section, and it generated errors as: "No rule to make target `/usr/include/mpif.h', needed by `mpif.h'. Stop." The problem is I could not find "mpif.h" in my computer. Does this mean I should install other Open MPI version rather than using Leopard's built-in version? Thanks for the help! Best, Linwei
[OMPI users] Running Hetergenous MPI Application Over Infiniband
Hello, I am trying to run a simple Hello World MPI application in a heterogeneous environment. The machines include 1 x86 machine with a standard 1Gb ethernet connection and 2 ppc machines with standard 1Gb ethernet as well as a 10Gb ethernet (Infiniband) switch between the two. The Hello World program is the same hello_c.c that is included in the examples directory of the Open MPI installation. The goal is that I would like to run heterogeneous applications between the three aforementioned machines in the following manner: The x86 machine will use tcp to communicate to the 2 ppc machines, while the ppc machines will communicate with one another via the 10GbE. x86 <--tcp--> ppc_1 x86 <--tcp--> ppc_2 ppc1 <--openib--> ppc_2 I am currently using a machfile set up as follows, # cat machfile In addition I am using an appfile set up as follows, # cat appfile -np 1 --hostfile machfile --host --mca btl sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello -np 1 --hostfile machfile --host --mca btl sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello -np 1 --hostfile machfile --host --mca btl sm,self,tcp /path/to/x86/openmpi-1.2.5/examples/hello I am running on the command line via # mpirun --app appfile I've also attached the output from 'ompi_info --all' from all machines. Any suggestions would be much appreciated. Thanks, Ryan Open MPI: 1.2.5 Open MPI SVN revision: r16989 Open RTE: 1.2.5 Open RTE SVN revision: r16989 OPAL: 1.2.5 OPAL SVN revision: r16989 MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.5) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.5) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.5) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.5) MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.5) MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.5) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.5) MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.5) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.5) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.5) MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.5) MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.5) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.5) MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.5) MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.5) MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.5) MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.5) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.5) MCA pls:
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Linwei, mpif.h is the include file for fortran programs to use openmpi. The apple version does not support fortran. If you want to use openmpi from fortran you will need to install a version of openmpi that supports fortran, this will install mpif.h. I suggest you install the new version in a different directory than the apple version ( use -- prefix in the openmpi configure command). You will also need to remove the apple version or rename the openmpi include and library files so that the linker can find your new, fortran supporting version. Doug Reeder On May 1, 2008, at 8:42 AM, Linwei Wang wrote: Dear all, I'm new to openmpi. I'm now trying to use BLACS and ScaLAPACK on Leopard. Since it has built-in Open MPI, I didn't install any other versions. I followed the BLACS install guidances in FAQ section, and it generated errors as: "No rule to make target `/usr/include/mpif.h', needed by `mpif.h'. Stop." The problem is I could not find "mpif.h" in my computer. Does this mean I should install other Open MPI version rather than using Leopard's built-in version? Thanks for the help! Best, Linwei ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] users Digest, Vol 885, Issue 2
The problem is that when running over InfiniBand the application hangs on the call to MPI_Init. Thanks, Ryan On Thu, 2008-05-01 at 12:02 -0400, users-requ...@open-mpi.org wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > >1. Running Hetergenous MPI Application Over Infiniband > (Ryan Buckley ; 21426) > > > -- > > Message: 1 > Date: Thu, 01 May 2008 12:02:01 -0400 > From: "Ryan Buckley ; 21426" > Subject: [OMPI users] Running Hetergenous MPI Application Over > Infiniband > To: us...@open-mpi.org > Message-ID: <1209657721.6518.28.camel@localhost> > Content-Type: text/plain; charset="us-ascii" > > Hello, > > I am trying to run a simple Hello World MPI application in a > heterogeneous environment. The machines include 1 x86 machine with a > standard 1Gb ethernet connection and 2 ppc machines with standard 1Gb > ethernet as well as a 10Gb ethernet (Infiniband) switch between the two. > The Hello World program is the same hello_c.c that is included in the > examples directory of the Open MPI installation. > > The goal is that I would like to run heterogeneous applications between > the three aforementioned machines in the following manner: > > The x86 machine will use tcp to communicate to the 2 ppc machines, > while the ppc machines will communicate with one another via the 10GbE. > > x86 <--tcp--> ppc_1 > x86 <--tcp--> ppc_2 > ppc1 <--openib--> ppc_2 > > I am currently using a machfile set up as follows, > > # cat machfile > > > > > In addition I am using an appfile set up as follows, > > # cat appfile > -np 1 --hostfile machfile --host --mca btl > sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello > -np 1 --hostfile machfile --host --mca btl > sm,self,tcp,openib /path/to/ppc/openmpi-1.2.5/examples/hello > -np 1 --hostfile machfile --host --mca btl > sm,self,tcp /path/to/x86/openmpi-1.2.5/examples/hello > > I am running on the command line via > > # mpirun --app appfile > > I've also attached the output from 'ompi_info --all' from all machines. > > Any suggestions would be much appreciated. > > Thanks, > > Ryan > > -- next part -- > Open MPI: 1.2.5 >Open MPI SVN revision: r16989 > Open RTE: 1.2.5 >Open RTE SVN revision: r16989 > OPAL: 1.2.5 >OPAL SVN revision: r16989 >MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.5) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.5) >MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.5) >MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.5) >MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.5) > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.5) > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.5) >MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) >MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.5) > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.5) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.5) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.5) > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.5) >MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.5) >MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.5) > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.5) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.5) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.5) > MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.5) > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.5) > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.5) > MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.5) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.5) > MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.5) > MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.5) > MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.5) > MCA gpr: n
Re: [OMPI users] Install BLACS and ScaLAPACK on Leopard
Dear Doug, Thanks very much. I installed the latest OpenMPI and BLACS. For the ScaLAPACK, I had some problem related to BLAS library. Since in mac, it is in the vecLib, I've no idea how to set the BLASLIB in the SLmake.inc file for ScaLAPACK. also, though compiling BLAS succeed, I'm not able to build the testers, It generated large amounts of output as follows. Do you have any idea what is the problem? /usr/local/openmpi-1.2.6/bin/mpif77 -c blacstest.f blacstest.f: In subroutine `runtests': blacstest.f:150: warning: CALL RUNTESTS( MEM, MEMLEN, CMEM, CMEMSIZ, PREC, NPREC, OUTNUM, 1 blacstest.f:178: (continued): SUBROUTINE RUNTESTS( MEM, MEMLEN, CMEM, CMEMLEN, PREC, NPREC, 2 Argument #1 (named `mem') of `runtests' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `ssdrvtest': blacstest.f:299: warning: CALL SSDRVTEST(OUTNUM, VERB, NSHAPE, CMEM(UPLOPTR), 1 blacstest.f:2545: (continued): SUBROUTINE SSDRVTEST( OUTNUM, VERB, NSHAPE, UPLO0, DIAG0, 2 Argument #21 (named `mem') of `ssdrvtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `dsdrvtest': blacstest.f:311: warning: CALL DSDRVTEST(OUTNUM, VERB, NSHAPE, CMEM(UPLOPTR), 1 blacstest.f:2889: (continued): SUBROUTINE DSDRVTEST( OUTNUM, VERB, NSHAPE, UPLO0, DIAG0, 2 Argument #21 (named `mem') of `dsdrvtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `csdrvtest': blacstest.f:323: warning: CALL CSDRVTEST(OUTNUM, VERB, NSHAPE, CMEM(UPLOPTR), 1 blacstest.f:3233: (continued): SUBROUTINE CSDRVTEST( OUTNUM, VERB, NSHAPE, UPLO0, DIAG0, 2 Argument #21 (named `mem') of `csdrvtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `zsdrvtest': blacstest.f:335: warning: CALL ZSDRVTEST(OUTNUM, VERB, NSHAPE, CMEM(UPLOPTR), 1 blacstest.f:3577: (continued): SUBROUTINE ZSDRVTEST( OUTNUM, VERB, NSHAPE, UPLO0, DIAG0, 2 Argument #21 (named `mem') of `zsdrvtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `sbsbrtest': blacstest.f:389: warning: CALL SBSBRTEST(OUTNUM, VERB, NSCOPE, CMEM(SCOPEPTR), 1 blacstest.f:4336: (continued): SUBROUTINE SBSBRTEST( OUTNUM, VERB, NSCOPE, SCOPE0, NTOP, TOP0, 2 Argument #23 (named `mem') of `sbsbrtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `dbsbrtest': blacstest.f:401: warning: CALL DBSBRTEST(OUTNUM, VERB, NSCOPE, CMEM(SCOPEPTR), 1 blacstest.f:4751: (continued): SUBROUTINE DBSBRTEST( OUTNUM, VERB, NSCOPE, SCOPE0, NTOP, TOP0, 2 Argument #23 (named `mem') of `dbsbrtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `cbsbrtest': blacstest.f:413: warning: CALL CBSBRTEST(OUTNUM, VERB, NSCOPE, CMEM(SCOPEPTR), 1 blacstest.f:5166: (continued): SUBROUTINE CBSBRTEST( OUTNUM, VERB, NSCOPE, SCOPE0, NTOP, TOP0, 2 Argument #23 (named `mem') of `cbsbrtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `zbsbrtest': blacstest.f:425: warning: CALL ZBSBRTEST(OUTNUM, VERB, NSCOPE, CMEM(SCOPEPTR), 1 blacstest.f:5581: (continued): SUBROUTINE ZBSBRTEST( OUTNUM, VERB, NSCOPE, SCOPE0, NTOP, TOP0, 2 Argument #23 (named `mem') of `zbsbrtest' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f: In subroutine `sbtcheckin': blacstest.f:120: warning: CALL BTRECV( 3, 2, ITMP, 0, IBTMSGID() ) 1 blacstest.f:7429: (continued): CALL BTRECV(4, NERR2*2, SVAL, K, IBTMSGID()+51) 2 Argument #3 of `btrecv' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f:97: warning: CALL BTSEND( 3, 2, ITMP, -1, IBTMSGID() ) 1 blacstest.f:7451: (continued): CALL BTSEND(4, NERR*2, SVAL, 0, IBTMSGID()+51) 2 Argument #3 of `btsend' is one type at (2) but is some other type at (1) [info -f g77 M GLOBALS] blacstest.f:2824: warning: CALL SBTCHECKIN( 0, OUTNUM, MAXERR, NERR, 1 blacstest.f:7339: (continued): SUBROUTINE SBTCHECKIN( NFTESTS, OUTNUM, MAXERR, NER