Re: [OMPI users] coll_ml_priority in openmpi-1.7.5

2014-03-24 Thread tmishima
I ran our application using the final version of openmpi-1.7.5 again with coll_ml_priority = 90. Then, coll/ml was actually activated and I got these error messages as shown below: [manage][[11217,1],0][coll_ml_lmngr.c:265:mca_coll_ml_lmngr_alloc] COLL-ML List manager is empty. [manage][[11217,1

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-24 Thread Matt Thompson
Jeff, I ran these commands: $ make clean $ make distclean (wanted to be extra sure!) $ ./configure CC=gcc CXX=g++ F77=pgfortran FC=pgfortran CFLAGS='-m64' CXXFLAGS='-m64' LDFLAGS='-m64' FCFLAGS='-m64' FFLAGS='-m64' --prefix=/Users/fortran/AutomakeBug/autobug14 | & tee configure.log $ make V=1 i

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-24 Thread Jeff Squyres (jsquyres)
On Mar 24, 2014, at 6:34 PM, Matt Thompson wrote: > Sorry for the late reply. The answer is: No, 1.14.1 has not fixed the problem > (and indeed, that's what my Mac is running): > > (28) $ make install | & tee makeinstall.log > Making install in src > ../config/install-sh -c -d '/Users/fortran/

Re: [OMPI users] Help building/installing a working Open MPI 1.7.4 on OS X 10.9.2 with Free PGI Fortran

2014-03-24 Thread Matt Thompson
Jeff, Sorry for the late reply. The answer is: No, 1.14.1 has not fixed the problem (and indeed, that's what my Mac is running): (28) $ make install | & tee makeinstall.log Making install in src ../config/install-sh -c -d '/Users/fortran/AutomakeBug/autobug14/lib' /bin/sh ../libtool --mode=in

Re: [OMPI users] delays in Isend

2014-03-24 Thread Ross Boylan
On Mon, 2014-03-24 at 07:59 -0700, Ralph Castain wrote: > I suspect the root cause of the problem here lies in how MPI messages are > progressed. OMPI doesn't have an async progress method (yet), and so > messaging on both send and recv ends is only progressed when the app calls > the MPI librar

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-24 Thread Hamid Saeed
Hello Jeff, Thanks for your cooperation. --mca btl_tcp_if_include br0 worked out of the box. The problem was from the network administrator. The machines on the network side were halting the mpi... so cleaning and killing every thing worked. :) regards. On Mon, Mar 24, 2014 at 4:34 PM, Jef

Re: [OMPI users] usNIC point-to-point messaging module

2014-03-24 Thread Jeff Squyres (jsquyres)
No, this is not a configure issue -- the usnic BTL uses the verbs API. The usnic BTL should be disqualifying itself at runtime, though, if you don't have usNIC devices. Are you running on Cisco UCS servers with Cisco VICs, perchance? If not, could you send the output of "mpirun --mca btl_base_v

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-24 Thread Jeff Squyres (jsquyres)
There is no "self" IP interface in the Linux kernel. Try using btl_tcp_if_include and list just the interface(s) that you want to use. From your prior email, I'm *guessing* it's just br2 (i.e., the 10.x address inside your cluster). Also, it looks like you didn't setup your SSH keys properly f

Re: [OMPI users] cleanup of round robin mappers

2014-03-24 Thread Ralph Castain
Looks good - thanks! On Mar 24, 2014, at 4:55 AM, tmish...@jcity.maeda.co.jp wrote: > > Hi Ralph, > > I tried to improve checking for mapping-too-low and fixed a minor > problem in rmaps_rr.c file. Please see attached patch file. > > 1) Regarding mapping-too-low, in future we'll have a lager s

Re: [OMPI users] another corner case hangup in openmpi-1.7.5rc3

2014-03-24 Thread Ralph Castain
The "updated"field in the orte_job_t structure is only used to help reduce the size of the launch message sent to all the daemons. Basically, we only include info on jobs that have been changed - thus, it only gets used when the app calls comm_spawn. After every launch, we automatically change i

Re: [OMPI users] delays in Isend

2014-03-24 Thread Ralph Castain
I suspect the root cause of the problem here lies in how MPI messages are progressed. OMPI doesn't have an async progress method (yet), and so messaging on both send and recv ends is only progressed when the app calls the MPI library. It sounds like your app issues an isend or recv, and then spe

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-24 Thread Hamid Saeed
Hello, I added the "self" e.g hsaeed@karp:~/Task4_mpi/scatterv$ mpirun -np 8 --mca btl ^openib --mca btl_tcp_if_exclude sm,self,lo,br0,br1,ib0,br2 --host karp,wirth ./scatterv Enter passphrase for key '/home/hsaeed/.ssh/id_rsa': ---

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-24 Thread Jeff Squyres (jsquyres)
If you you use btl_tcp_if_exclude, you also need to exclude the loopback interface. Loopback is excluded by the default value of btl_tcp_if_exclude, but if you overwrite that value, then you need to *also* include the loopback interface in the new value. On Mar 24, 2014, at 4:57 AM, Hamid Sa

[OMPI users] cleanup of round robin mappers

2014-03-24 Thread tmishima
Hi Ralph, I tried to improve checking for mapping-too-low and fixed a minor problem in rmaps_rr.c file. Please see attached patch file. 1) Regarding mapping-too-low, in future we'll have a lager size of l1,2,3cache or other architectuers, and in that case, the needs to map by a lower object leve

Re: [OMPI users] Fwd: problem for multiple clusters using mpirun

2014-03-24 Thread Hamid Saeed
Hello, Still i am facing problems. I checked there is no firewall which is acting as a barrier for the mpi communication. even i used the execution line like hsaeed@karp:~/Task4_mpi/scatterv$ mpiexec -n 2 --mca btl_tcp_if_exclude br2 -host wirth,karp ./a.out Now the output hangup without displayi