Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9; problems on heterogeneous cluster

2006-03-16 Thread Ravi Manumachu

Hi Brian,

I have installed OpenMPI-1.1a1r9260 on my SunOS machines. It has solved
the problems. However there is one more issue that I found in my testing
and that I failed to report. This concerns Linux machines too.

My host file is

hosts.txt
-
csultra06
csultra02
csultra05
csultra08

My app file is 

mpiinit_appfile
---
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit
-np 1 /home/cs/manredd/OpenMPI/openmpi-1.1a1r9260/MPITESTS/mpiinit

My application program is

mpiinit.c
-

#include 

int main(int argc, char** argv)
{
int rc, me;
char pname[MPI_MAX_PROCESSOR_NAME];
int plen;

MPI_Init(
   &argc,
   &argv
);

rc = MPI_Comm_rank(
MPI_COMM_WORLD,
&me
);

if (rc != MPI_SUCCESS)
{
   return rc;
}

MPI_Get_processor_name(
   pname,
   &plen
);

printf("%s:Hello world from %d\n", pname, me);

MPI_Finalize();

return 0;
}

Compilation is successful

csultra06$ mpicc -o mpiinit mpiinit.c

However mpirun prints just 6 statements instead of 8.

csultra06$ mpirun --hostfile hosts.txt --app mpiinit_appfile
csultra02:Hello world from 5
csultra06:Hello world from 0
csultra06:Hello world from 4
csultra02:Hello world from 1
csultra08:Hello world from 3
csultra05:Hello world from 2

The following two more statements are not printed.

csultra05:Hello world from 6
csultra08:Hello world from 7

This behavior I observed on my Linux cluster too.

I have attached the log for "-d" option for your debugging purposes.

Regards,
Ravi.

- Original Message -
From: Brian Barrett 
List-Post: users@lists.open-mpi.org
Date: Monday, March 13, 2006 7:56 pm
Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9;
problems on heterogeneous cluster
To: Open MPI Users 

> Hi Ravi -
> 
> With the help of another Open MPI user, I spent the weekend finding 
> a  
> couple of issues with Open MPI on Solaris.  I believe you are 
> running  
> into the same problems.  We're in the process of certifying the  
> changes for release as part of 1.0.2, but it's Monday morning and 
> the  
> release manager hasn't gotten them into the release branch just 
> yet.   
> Could you give the nightly tarball from our development trunk a try 
> 
> and let us know if it solves your problems on Solaris?  You 
> probably  
> want last night's 1.1a1r9260 release.
> 
> http://www.open-mpi.org/nightly/trunk/
> 
> Thanks,
> 
> Brian
> 
> 
> On Mar 12, 2006, at 11:23 PM, Ravi Manumachu wrote:
> 
> >
> >  Hi Brian,
> >
> >  Thank you for your help. I have attached all the files you have 
> asked>  for in a tar file.
> >
> >  Please find attached the 'config.log' and 'libmpi.la' for my 
> Solaris>  installation.
> >
> >  The output from 'mpicc -showme' is
> >
> >  sunos$ mpicc -showme
> >  gcc -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-5.9/ 
> > include
> >  -I/home/cs/manredd/OpenMPI/openmpi-1.0.1/OpenMPI-SunOS-
> >  5.9/include/openmpi/ompi-L/home/cs/manredd/OpenMPI/openmpi-
> >  1.0.1/OpenMPI-SunOS-5.9/lib -lmpi
> >  -lorte -lopal -lnsl -lsocket -lthread -laio -lm -lnsl -lsocket -
> >  lthread -ldl
> >
> >  There are serious issues when running on just solaris machines.
> >
> >  I am using the host file and app file shown below. Both the
> >  machines are
> >  SunOS and are similar.
> >
> >  hosts.txt
> >  -
> >  csultra01 slots=1
> >  csultra02 slots=1
> >
> >  mpiinit_appfile
> >  ---
> >  -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos
> >  -np 1 /home/cs/manredd/OpenMPI/openmpi-1.0.1/MPITESTS/mpiinit_sunos
> >
> >  Running mpirun without -d option hangs.
> >
> >  csultra01$ mpirun --hostfile hosts.txt --app mpiinit_appfile
> >  hangs
> >
> >  Running mpirun with -d option dumps core with output in the file
> >  "mpirun_output_d_option.txt", which is attached. The core is also
> >  attached.
> >  Running just on only one host is also not working. The output from
> >  mpirun using "-d" option for this scenario is attached in file
> >  "mpirun_output_d_option_one_host.txt".
> >
> >  I have also attached the list of packages installed on my solaris
> >  machine in "pkginfo.txt"
> >
> >  I hope these will help you to resolve the issue.
> >
> >  Regards,
> >  Ravi.
> >
> >> - Original Message -
> >> From: Brian Barrett 
> >> Date: Friday, March 10, 2006 7:09 pm
> >> Subject: Re: [OMPI users] problems with OpenMPI-1.0.1 on SunOS 5.9;
> >> problems on heterogeneous cluster
> >> To: Open MPI Users 
> >>
> >>> On Mar 10, 2006, at 12:09 AM, Ravi Manu

Re: [OMPI users] Memory allocation issue with OpenIB

2006-03-16 Thread Galen M. Shipman

Emanuel,

Thanks for the tip on this issue, we will be adding it to the FAQ  
shortly.


- Galen

On Mar 15, 2006, at 4:29 PM, Emanuel Ziegler wrote:


Hi Davide!

You are using the -prefix option. I guess this is due to the fact  
that You

cannot set the paths appropriately. Most likely You are using rsh for
starting remote processes.

This causes some trouble since the environment offered by rsh lacks  
many
things that a usual login environment offers (e.g. the path is  
hardcoded

and cannot be changed).

Checking with
mpirun -np 2 -prefix /usr/local /bin/bash -c ulimit -l
may result in reporting plenty of memory (according to Your  
settings) but

this is not reliable since the new bash instance sets the limits
differently.
Unfortunately
mpirun -np 2 -prefix /usr/local ulimit -l
does not work, since mpirun expects an executable. So the only way to
check is to run rsh directly like
rsh remotenode ulimit -l
(where remotenode has to be replaced by the name of the remote  
host). This
may give a different result (e.g. 32 which is way too small). In my  
case

this problem was solved by adding
session requiredpam_limits.so
at the end of the file "/etc/pam.d/rsh".

In case of ssh check the file "/etc/pam.d/ssh" for a line similar  
to the

one above and add it if it does not yet exist.

Hope that helps,
Emanuel
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Performance of ping-pong using OpenMPI over Infiniband

2006-03-16 Thread Jean Latour

Hello,

Testing performance of OpenMPI over Infiniband I have the following 
result :


1) Hardware is  : SilversStorm interface

2) Openmpi version is : (from ompi_info)
  Open MPI: 1.0.2a9r9159
  Open MPI SVN revision: r9159
   Open RTE: 1.0.2a9r9159
  Open RTE SVN revision: r9159
   OPAL: 1.0.2a9r9159
  OPAL SVN revision: r9159

3) Cluster with Bi-processors Opteron 248   2.2 GHz

Configure has been run with option --with-mvapi=path-to-mvapi

4) a C coded pinpong gives the following values :

LOOPS: 1000 BYTES: 4096 SECONDS: 0.085557  MBytes/sec: 95.749051
LOOPS: 1000 BYTES: 8192 SECONDS: 0.050657  MBytes/sec: 323.429912
LOOPS: 1000 BYTES: 16384 SECONDS: 0.084038  MBytes/sec: 389.918757
LOOPS: 1000 BYTES: 32768 SECONDS: 0.163161  MBytes/sec: 401.665104
LOOPS: 1000 BYTES: 65536 SECONDS: 0.306694  MBytes/sec: 427.370561
LOOPS: 1000 BYTES: 131072 SECONDS: 0.529589  MBytes/sec: 494.995011
LOOPS: 1000 BYTES: 262144 SECONDS: 0.952616  MBytes/sec: 550.366583
LOOPS: 1000 BYTES: 524288 SECONDS: 1.927987  MBytes/sec: 543.870859
LOOPS: 1000 BYTES: 1048576 SECONDS: 3.673732  MBytes/sec: 570.850562
LOOPS: 1000 BYTES: 2097152 SECONDS: 9.993185  MBytes/sec: 419.716435
LOOPS: 1000 BYTES: 4194304 SECONDS: 18.211958  MBytes/sec: 460.609893
LOOPS: 1000 BYTES: 8388608 SECONDS: 35.421490  MBytes/sec: 473.645124

My questions are :
a)  Is OpenMPI doing in this case TCP/IP over IB ? (I guess so)
b) Is it possible to improve significantly these values by changing the 
defaults ?


   I have used several mca btl parameters but without improving the 
maximum bandwith.

  For example :  --mca btl mvapi   --mca btl_mvapi_max_send_size 8388608

c) Is it possible that other IB hardware implementations  have better
   performances with OpenMPI ?

d) Is it possible to use specific IB drivers  for optimal performance  ? 
(should reach almost 800 MB/sec)


Thank you very much for your help
Best Regards,
Jean Latour

<>

Re: [OMPI users] Performance of ping-pong using OpenMPI over Infiniband

2006-03-16 Thread Galen M. Shipman

Hi Jean,

Take a look here: http://www.open-mpi.org/faq/?category=infiniband#ib- 
leave-pinned


This should improve performance for micro-benchmarks and some  
applications.


Please let mw know if this doesn't solve the issue.

Thanks,
Galen
On Mar 16, 2006, at 10:34 AM, Jean Latour wrote:


Hello,

Testing performance of OpenMPI over Infiniband I have the following  
result :


1) Hardware is  : SilversStorm interface

2) Openmpi version is : (from ompi_info)
  Open MPI: 1.0.2a9r9159
  Open MPI SVN revision: r9159
   Open RTE: 1.0.2a9r9159
  Open RTE SVN revision: r9159
   OPAL: 1.0.2a9r9159
  OPAL SVN revision: r9159

3) Cluster with Bi-processors Opteron 248   2.2 GHz

Configure has been run with option --with-mvapi=path-to-mvapi

4) a C coded pinpong gives the following values :

LOOPS: 1000 BYTES: 4096 SECONDS: 0.085557  MBytes/sec: 95.749051
LOOPS: 1000 BYTES: 8192 SECONDS: 0.050657  MBytes/sec: 323.429912
LOOPS: 1000 BYTES: 16384 SECONDS: 0.084038  MBytes/sec: 389.918757
LOOPS: 1000 BYTES: 32768 SECONDS: 0.163161  MBytes/sec: 401.665104
LOOPS: 1000 BYTES: 65536 SECONDS: 0.306694  MBytes/sec: 427.370561
LOOPS: 1000 BYTES: 131072 SECONDS: 0.529589  MBytes/sec: 494.995011
LOOPS: 1000 BYTES: 262144 SECONDS: 0.952616  MBytes/sec: 550.366583
LOOPS: 1000 BYTES: 524288 SECONDS: 1.927987  MBytes/sec: 543.870859
LOOPS: 1000 BYTES: 1048576 SECONDS: 3.673732  MBytes/sec: 570.850562
LOOPS: 1000 BYTES: 2097152 SECONDS: 9.993185  MBytes/sec: 419.716435
LOOPS: 1000 BYTES: 4194304 SECONDS: 18.211958  MBytes/sec: 460.609893
LOOPS: 1000 BYTES: 8388608 SECONDS: 35.421490  MBytes/sec: 473.645124

My questions are :
a)  Is OpenMPI doing in this case TCP/IP over IB ? (I guess so)
b) Is it possible to improve significantly these values by changing  
the defaults ?


   I have used several mca btl parameters but without improving the  
maximum bandwith.
  For example :  --mca btl mvapi   --mca btl_mvapi_max_send_size  
8388608


c) Is it possible that other IB hardware implementations  have better
   performances with OpenMPI ?

d) Is it possible to use specific IB drivers  for optimal  
performance  ? (should reach almost 800 MB/sec)


Thank you very much for your help
Best Regards,
Jean Latour


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Performance of ping-pong using OpenMPI over Infiniband

2006-03-16 Thread George Bosilca

On Thu, 16 Mar 2006, Jean Latour wrote:

> My questions are :
> a)  Is OpenMPI doing in this case TCP/IP over IB ? (I guess so)

If the path to the mvapi library is correct then Open MPI will use mvapi 
not TCP over IB. There is a simple way to check. "ompi_info --param btl 
mvapi" will print all the parameters attached to the mvapi driver. If 
there is no mvapi in the output, then mvapi was not correctly detected. 
But I don't think it's the case, because if I remember well we have a 
protection at configure time. If you specify one of the drivers and we're 
not able to correctly use the libraries, we will stop the configure.


> b) Is it possible to improve significantly these values by changing the 
> defaults ?

By default we are using a very conservative approach. We never leave the 
memory pinned down, and that decrease the performance for a ping-pong. 
There are pro and cons for that, too long to be explained here, but in 
general we're seeing better performance for real-life applications with 
our default approach, and that's our main goal.

Now, if you want to get better performance for the ping-pong test please 
read the FAQ at http://www.open-mpi.org/faq/?category=infiniband.

These are the 3 flags that affect the mvapi performance for the ping-pong 
case (add them in $(HOME)/.openmpi/mca-params.conf):
btl_mvapi_flags=6
mpi_leave_pinned=1
pml_ob1_leave_pinned_pipeline=1

>
>   I have used several mca btl parameters but without improving the maximum 
> bandwith.
>  For example :  --mca btl mvapi   --mca btl_mvapi_max_send_size 8388608

It is difficult to improve the maximum bandwidth without the leave_pinned 
activated. But you can improve the bandwidth for medium size messages. 
Play with btl_mvapi_eager_limit to set the limit between short and 
rendez-vous protocol. "ompi_info --param btl mvapi" will give you a full 
list of parameters as well as their description.

>
> c) Is it possible that other IB hardware implementations  have better
>   performances with OpenMPI ?

The maximum bandwidth depend on several factors. One of the most 
importants is the maximum bandwidth on your node bus. To reach 800 and 
more MB/s you definitively need a PCI-X 16 ...

>
> d) Is it possible to use specific IB drivers  for optimal performance  ? 
> (should reach almost 800 MB/sec)

Once the 3 options are set, you should see an improvement on the 
bandwidth.

Let me know if it does not solve your problems.

   george.

"We must accept finite disappointment, but we must never lose infinite
hope."
   Martin Luther King



Re: [OMPI users] Using Multiple Gigabit Ethernet Interface

2006-03-16 Thread Jayabrata Chakrabarty
Thanks Brian, Thanks Michael
I wanted to benchmark the communcation throughput and latency using multiple using gigabit eithernet controller.
So here are the results which i want share with you all
I used .
OpenMPI version 1.0.2a10r9275
Hpcbench 
Two Dell Precision 650 workstation.
The Dell Precision 650 workstation has three separate PCI bus segments.
Segment 1 -> PCI Slot1,2 -> 32 bit, 33MHz, Shared with integrated 1394 
Segment 2 -> PCI SLot3,4 -> 64 bit, 100MHz, Shared with the Gb Ethernet connection 
Segment 3 -> PCI Slot 5 -> Shared with Integrated Ultra 320 controller 
The workstation has Integrated PCI-X 64-bit Intel 10/100/1000 Gigabit Ethernet.
I added three D-Link DGE-530T 1000 Mbps Ethernet Card in Slot2, Slot4 and Slot5 respectively.
As i expected, the Card in slot5 performed better than the cards in other slots. Hereare the results.
(Using Slot2)-# MPI communication latency (roundtrip time) test -- Wed Mar 15 09:19:10 2006# Hosts: DELL <> DELL2# Blocking Communication (MPI_Send/MPI_Recv)# Message size (Bytes) : 40960# Iteration: 7# Test time (Seconds): 0.20
# RTT-time #   Microseconds1    25953.5652    25569.4393    22392.0004    20876.5785    21327.1216    19597.1567    21264.0088    24109.5689    23877.85910   24064.575
# MPI RTT min/avg/max = 19597.156/22903.187/25953.565 usec
--
# MPI communication test -- Wed Mar 15 10:16:22 2006# Test mode: Fixed-size stream (unidirectional) test# Hosts: DELL <> DELL2# Blocking communication (MPI_Send/MPI_Recv)# Total data size of each test (Bytes): 524288000# Message size (Bytes): 104857600# Iteration : 5# Test time: 5.00# Test repetition: 10##  Overall    Master-node  M-process  M-process   Slave-node   S-process  S-process#    Throughput  Elapsed-time  User-mode   Sys-mode  Elapsed-time  User-mode   Sys-mode#    Mbps    Seconds Seconds    Seconds Seconds 
Seconds Seconds1 521.9423 8.04    1.42   6.62    8.04    0.93    7.102 551.5377 7.60    1.20   6.41    7.60    0.77    6.873 552.5600 7.59    1.27   6.32    
7.59    0.82    6.814 552.6328 7.59    1.28   6.31    7.59    0.80    6.835 552.6334 7.59    1.24   6.35    7.59    0.86    6.776 552.7048 7.59    
1.26   6.33    7.59    0.77    6.867 563.6736 7.44    1.22   6.22    7.44    0.78    6.708 552.2710 7.59    1.22   6.37    7.59    0.83    6.809 
520.9938 8.05    1.37   6.68    8.05    0.93    7.1610    535.0131 7.84    1.36   6.48    7.84    0.84    7.04
==
(Using Slot3)-# MPI communication latency (roundtrip time) test -- Thu Mar 16 10:15:58 2006# Hosts: DELL <> DELL2# Blocking Communication (MPI_Send/MPI_Recv)# Message size (Bytes) : 40960# Iteration: 10# Test time (Seconds): 0.20
# RTT-time #   Microseconds1    20094.2042    14773.5123    14846.0154    17756.8205    18419.2906    23394.7997    21840.5968    17727.4949    21822.09510   17659.688
# MPI RTT min/avg/max = 14773.512/18833.451/23394.799 usec
--
# MPI communication test -- Wed Mar 15 09:17:54 2006# Test mode: Fixed-size stream (unidirectional) test# Hosts: DELL <> DELL2# Blocking communication (MPI_Send/MPI_Recv)# Total data size of each test (Bytes): 524288000# Message size (Bytes): 104857600# Iteration : 5# Test time: 5.00# Test repetition: 10##  Overall    Master-node  M-process  M-process   Slave-node   S-process  S-process#    Throughput  Elapsed-time  User-mode   Sys-mode  Elapsed-time  User-mode   Sys-mode#    Mbps    Seconds Seconds    Seconds Seconds 
Seconds Seconds1 794.9650 5.28    1.04   4.24    5.28    0.47    4.812 838.1621 5.00    0.91   4.09    5.00    0.39    4.653 898.3811 4.67    0.84   3.82    
4.67    0.34    4.374 798.9575 5.25    1.03   4.22    5.25    0.40    4.895 829.7181 5.06    0.94   4.11    5.05    0.40    4.696 881.5526 4.76    
0.86   3.90    4.76    0.28    4.527 827.9215 5.07    0.96   4.11    5.07    0.41    4.708 845.6428 4.96    0.87   4.09    4.96    0.38    4.629 
845.6903 4.96    0.90   4.06    4.96 

[OMPI users] mca_oob_tcp_peer_complete_connect: connection failed

2006-03-16 Thread Charles Wright
Hello,
I'm just compiled open-mpi and tried to run my code which just
measures bandwidth from one node to another.   (Code compile fine and
runs under other mpi implementations)

When I did I got this.  

uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
c317-6
c317-5
[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=110) - retrying (pid=24979)
[c317-5:24979] mca_oob_tcp_peer_timer_handler
[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=110) - retrying (pid=24997)
[c317-5:24997] mca_oob_tcp_peer_timer_handler

[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
connect() failed with errno=110


I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
something to do with it.

I've attached my config.log

Any help with this would be appreciated.

uahrcw@c275-6:~/mpi-benchmarks> ompi_info
Open MPI: 1.0.1r8453
   Open MPI SVN revision: r8453
Open RTE: 1.0.1r8453
   Open RTE SVN revision: r8453
OPAL: 1.0.1r8453
   OPAL SVN revision: r8453
  Prefix: /opt/asn/apps/openmpi-1.0.1
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: asnrcw
   Configured on: Fri Feb 24 15:19:37 CST 2006
  Configure host: c275-6
Built by: asnrcw
Built on: Fri Feb 24 15:40:09 CST 2006
  Built host: c275-6
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: no
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: g77
  Fortran77 compiler abs: /usr/bin/g77
  Fortran90 compiler: ifort
  Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: 1
  MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component
v1.0.1)
   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1)
   MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1)
   MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1)
   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1)
 MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1)
 MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1)
 MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1)
 MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1)
 MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1)
 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1)
 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1)
 MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1)
  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1)
  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1)
 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
 MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ras: slurm (MCA v1.0, API v1.0, Component v1.0.1)
 MCA ras: tm (MCA v1.0, API v1.0, Component v1.0.1)
 MCA rds: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
 MCA rds: resfile (MCA v1.0, API v1.0, Component v1.0.1)
   MCA rmaps: round_robin (MCA v1.0, API v1.0, Component v1.0.1)
MCA rmgr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
MCA rmgr: urm (MCA v1.0, API v1.0, Comp

Re: [OMPI users] mca_oob_tcp_peer_complete_connect: connection failed

2006-03-16 Thread George Bosilca
I see only 2 possibilities:
1. your trying to run Open MPI on nodes having multiple IP 
addresses.
2. your nodes are behind firewalls and Open MPI is unable to pass through.

Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full 
answer to your question.

   Thanks,
 george.

On Thu, 16 Mar 2006, Charles Wright wrote:

> Hello,
>I'm just compiled open-mpi and tried to run my code which just
> measures bandwidth from one node to another.   (Code compile fine and
> runs under other mpi implementations)
>
> When I did I got this.
>
> uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
> c317-6
> c317-5
> [c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection failed (errno=110) - retrying (pid=24979)
> [c317-5:24979] mca_oob_tcp_peer_timer_handler
> [c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
> connection failed (errno=110) - retrying (pid=24997)
> [c317-5:24997] mca_oob_tcp_peer_timer_handler
>
> [0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
> connect() failed with errno=110
>
>
> I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
> something to do with it.
>
> I've attached my config.log
>
> Any help with this would be appreciated.
>
> uahrcw@c275-6:~/mpi-benchmarks> ompi_info
>Open MPI: 1.0.1r8453
>   Open MPI SVN revision: r8453
>Open RTE: 1.0.1r8453
>   Open RTE SVN revision: r8453
>OPAL: 1.0.1r8453
>   OPAL SVN revision: r8453
>  Prefix: /opt/asn/apps/openmpi-1.0.1
> Configured architecture: x86_64-unknown-linux-gnu
>   Configured by: asnrcw
>   Configured on: Fri Feb 24 15:19:37 CST 2006
>  Configure host: c275-6
>Built by: asnrcw
>Built on: Fri Feb 24 15:40:09 CST 2006
>  Built host: c275-6
>  C bindings: yes
>C++ bindings: yes
>  Fortran77 bindings: yes (all)
>  Fortran90 bindings: no
>  C compiler: gcc
> C compiler absolute: /usr/bin/gcc
>C++ compiler: g++
>   C++ compiler absolute: /usr/bin/g++
>  Fortran77 compiler: g77
>  Fortran77 compiler abs: /usr/bin/g77
>  Fortran90 compiler: ifort
>  Fortran90 compiler abs: /opt/asn/intel/fce/9.0/bin/ifort
> C profiling: yes
>   C++ profiling: yes
> Fortran77 profiling: yes
> Fortran90 profiling: no
>  C++ exceptions: no
>  Thread support: posix (mpi: no, progress: no)
>  Internal debug support: no
> MPI parameter check: runtime
> Memory profiling support: no
> Memory debugging support: no
> libltdl support: 1
>  MCA memory: malloc_hooks (MCA v1.0, API v1.0, Component
> v1.0.1)
>   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.0.1)
>   MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.0.1)
>   MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.0.1)
>   MCA timer: linux (MCA v1.0, API v1.0, Component v1.0.1)
>   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
>   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
>MCA coll: basic (MCA v1.0, API v1.0, Component v1.0.1)
>MCA coll: self (MCA v1.0, API v1.0, Component v1.0.1)
>MCA coll: sm (MCA v1.0, API v1.0, Component v1.0.1)
>  MCA io: romio (MCA v1.0, API v1.0, Component v1.0.1)
>   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.0.1)
> MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.0.1)
> MCA pml: teg (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ptl: self (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ptl: sm (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ptl: tcp (MCA v1.0, API v1.0, Component v1.0.1)
> MCA btl: self (MCA v1.0, API v1.0, Component v1.0.1)
> MCA btl: sm (MCA v1.0, API v1.0, Component v1.0.1)
> MCA btl: tcp (MCA v1.0, API v1.0, Component v1.0)
>MCA topo: unity (MCA v1.0, API v1.0, Component v1.0.1)
> MCA gpr: null (MCA v1.0, API v1.0, Component v1.0.1)
> MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.0.1)
> MCA gpr: replica (MCA v1.0, API v1.0, Component v1.0.1)
> MCA iof: proxy (MCA v1.0, API v1.0, Component v1.0.1)
> MCA iof: svc (MCA v1.0, API v1.0, Component v1.0.1)
>  MCA ns: proxy (MCA v1.0, API v1.0, Component v1.0.1)
>  MCA ns: replica (MCA v1.0, API v1.0, Component v1.0.1)
> MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
> MCA ras: dash_host (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ras: hostfile (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ras: localhost (MCA v1.0, API v1.0, Component v1.0.1)
> MCA ra

Re: [OMPI users] mca_oob_tcp_peer_complete_connect: connection failed

2006-03-16 Thread Charles Wright
Thanks for the tip.

I see that both number 1 and 2 are true.
Openmpi is insisting on using my eth0 (I know this by watching the
firewall log on the node it is trying to go to)

This is despite the fact that I have the first dns entry go to eth1,
normally that is all pbs would need to do the right thing and use the
network I prefer.

Ok so I see there are some options to in/exclude interfaces.

however mpiexec is igorning my requests.
I tried it two ways.  Neither worked.   Firewall rejects traffic coming
into 1.0.x.x. network in both cases.

/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
-n 2 $XD1LAUNCHER ./mpimeasure
/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
-n 2 $XD1LAUNCHER ./mpimeasure

(see dns works... not over eth0)
uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig
eth0  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:60
  inet addr:1.0.21.134  Bcast:1.127.255.255  Mask:255.128.0.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
  TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:560395541 (534.4 Mb)  TX bytes:34367848 (32.7 Mb)
  Interrupt:16

eth1  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:61
  inet addr:1.128.21.134  Mask:255.128.0.0
  UP RUNNING NOARP  MTU:1500  Metric:1
  RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:6203028277 (5915.6 Mb)  TX bytes:566471561 (540.2 Mb)
  Interrupt:25

eth2  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
  TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:61216408 (58.3 Mb)  TX bytes:19079579 (18.1 Mb)
  Base address:0x2000 Memory:fea8-feaa

eth2:2Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
  inet addr:129.66.9.146  Bcast:129.66.9.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  Base address:0x2000 Memory:fea8-feaa

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
  TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:879631 (859.0 Kb)  TX bytes:879631 (859.0 Kb)

uahrcw@c344-6:~/mpi-benchmarks> ping c344-5
PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
time=0.067 ms
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
time=0.037 ms
64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
time=0.022 ms

--- c344-5.x.asc.edu ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms



George Bosilca wrote:
>I see only 2 possibilities:
>1. your trying to run Open MPI on nodes having multiple IP 
>addresses.
>2. your nodes are behind firewalls and Open MPI is unable to pass through.
>
>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full 
>answer to your question.
>
>   Thanks,
> george.
>
>On Thu, 16 Mar 2006, Charles Wright wrote:
>
>  
>>Hello,
>>   I'm just compiled open-mpi and tried to run my code which just
>>measures bandwidth from one node to another.   (Code compile fine and
>>runs under other mpi implementations)
>>
>>When I did I got this.
>>
>>uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
>>c317-6
>>c317-5
>>[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>connection failed (errno=110) - retrying (pid=24979)
>>[c317-5:24979] mca_oob_tcp_peer_timer_handler
>>[c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>connection failed (errno=110) - retrying (pid=24997)
>>[c317-5:24997] mca_oob_tcp_peer_timer_handler
>>
>>[0,1,1][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect]
>>connect() failed with errno=110
>>
>>
>>I compiled open-mpi with Pbspro 5.4-4 and I'm guessing that has
>>something to do with it.
>>
>>I've attached my config.log
>>
>>Any help with this would be appreciated.
>>
>>uahrcw@c275-6:~/mpi-benchmarks> ompi_info
>>   Open MPI: 1.0.1r8453
>>  Open MPI SVN revision: r8453
>>   Open RTE: 1.0.1r8453
>>  Open RTE SVN revision: r8453
>>   OPAL: 1.0.1r8453
>>  OPAL SVN revision: r8453
>> Prefix: /opt/asn/apps/openmpi-1.0.1
>>Configured architecture: x86_64-unknown-linux-gnu
>>  Configured by: asnrcw
>>  Configured on: Fri Feb 24 15:19:37 CST 20

Re: [OMPI users] mca_oob_tcp_peer_complete_connect: connection failed

2006-03-16 Thread George Bosilca
Sorry I wasn't clear enough on my previous post. The error messages that 
you get are comming from the OOB which is the framework we're using to 
setup the MPI run. The options that you use (btl_tcp_if_include) are only 
used for MPI communications. Please add "--mca oob_tcp_include eth0" to 
force the OOB framework to use eth0. In order to don't have to type all 
these options all the time you can add them in the 
$(HOME).openmpi/mca-params.conf file. A file containing:

oob_tcp_include=eth1
btl_tcp_if_include=eth1

should solve your problems, if the firewall is opened on eth1 between 
these nodes.

   Thanks,
 george.

On Thu, 16 Mar 2006, Charles Wright wrote:

> Thanks for the tip.
>
> I see that both number 1 and 2 are true.
> Openmpi is insisting on using my eth0 (I know this by watching the
> firewall log on the node it is trying to go to)
>
> This is despite the fact that I have the first dns entry go to eth1,
> normally that is all pbs would need to do the right thing and use the
> network I prefer.
>
> Ok so I see there are some options to in/exclude interfaces.
>
> however mpiexec is igorning my requests.
> I tried it two ways.  Neither worked.   Firewall rejects traffic coming
> into 1.0.x.x. network in both cases.
>
> /opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
> -n 2 $XD1LAUNCHER ./mpimeasure
> /opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
> -n 2 $XD1LAUNCHER ./mpimeasure
>
> (see dns works... not over eth0)
> uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig
> eth0  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:60
>  inet addr:1.0.21.134  Bcast:1.127.255.255  Mask:255.128.0.0
>  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>  RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:560395541 (534.4 Mb)  TX bytes:34367848 (32.7 Mb)
>  Interrupt:16
>
> eth1  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:61
>  inet addr:1.128.21.134  Mask:255.128.0.0
>  UP RUNNING NOARP  MTU:1500  Metric:1
>  RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:6203028277 (5915.6 Mb)  TX bytes:566471561 (540.2 Mb)
>  Interrupt:25
>
> eth2  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>  RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:1000
>  RX bytes:61216408 (58.3 Mb)  TX bytes:19079579 (18.1 Mb)
>  Base address:0x2000 Memory:fea8-feaa
>
> eth2:2Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>  inet addr:129.66.9.146  Bcast:129.66.9.255  Mask:255.255.255.0
>  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>  Base address:0x2000 Memory:fea8-feaa
>
> loLink encap:Local Loopback
>  inet addr:127.0.0.1  Mask:255.0.0.0
>  UP LOOPBACK RUNNING  MTU:16436  Metric:1
>  RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
>  TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
>  collisions:0 txqueuelen:0
>  RX bytes:879631 (859.0 Kb)  TX bytes:879631 (859.0 Kb)
>
> uahrcw@c344-6:~/mpi-benchmarks> ping c344-5
> PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
> 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
> time=0.067 ms
> 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
> time=0.037 ms
> 64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
> time=0.022 ms
>
> --- c344-5.x.asc.edu ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 1999ms
> rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms
>
>
>
> George Bosilca wrote:
>> I see only 2 possibilities:
>> 1. your trying to run Open MPI on nodes having multiple IP
>> addresses.
>> 2. your nodes are behind firewalls and Open MPI is unable to pass through.
>>
>> Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full
>> answer to your question.
>>
>>   Thanks,
>> george.
>>
>> On Thu, 16 Mar 2006, Charles Wright wrote:
>>
>>
>>> Hello,
>>>   I'm just compiled open-mpi and tried to run my code which just
>>> measures bandwidth from one node to another.   (Code compile fine and
>>> runs under other mpi implementations)
>>>
>>> When I did I got this.
>>>
>>> uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
>>> c317-6
>>> c317-5
>>> [c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>> connection failed (errno=110) - retrying (pid=24979)
>>> [c317-5:24979] mca_oob_tcp_peer_timer_handler
>>> [c317-5:24997] [0,1,1]-[0,0,0] mca_oob_tcp_peer_complete_connect:
>>> connectio

Re: [OMPI users] mca_oob_tcp_peer_complete_connect: connection failed

2006-03-16 Thread Charles Wright
That works!!
Thanks!!

George Bosilca wrote:
>Sorry I wasn't clear enough on my previous post. The error messages that 
>you get are comming from the OOB which is the framework we're using to 
>setup the MPI run. The options that you use (btl_tcp_if_include) are only 
>used for MPI communications. Please add "--mca oob_tcp_include eth0" to 
>force the OOB framework to use eth0. In order to don't have to type all 
>these options all the time you can add them in the 
>$(HOME).openmpi/mca-params.conf file. A file containing:
>
>oob_tcp_include=eth1
>btl_tcp_if_include=eth1
>
>should solve your problems, if the firewall is opened on eth1 between 
>these nodes.
>
>   Thanks,
> george.
>
>On Thu, 16 Mar 2006, Charles Wright wrote:
>
>  
>>Thanks for the tip.
>>
>>I see that both number 1 and 2 are true.
>>Openmpi is insisting on using my eth0 (I know this by watching the
>>firewall log on the node it is trying to go to)
>>
>>This is despite the fact that I have the first dns entry go to eth1,
>>normally that is all pbs would need to do the right thing and use the
>>network I prefer.
>>
>>Ok so I see there are some options to in/exclude interfaces.
>>
>>however mpiexec is igorning my requests.
>>I tried it two ways.  Neither worked.   Firewall rejects traffic coming
>>into 1.0.x.x. network in both cases.
>>
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_include eth1
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>/opt/asn/apps/openmpi-1.0.1/bin/mpiexec --gmca btl_tcp_if_exclude eth0
>>-n 2 $XD1LAUNCHER ./mpimeasure
>>
>>(see dns works... not over eth0)
>>uahrcw@c344-6:~/mpi-benchmarks> /sbin/ifconfig
>>eth0  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:60
>> inet addr:1.0.21.134  Bcast:1.127.255.255  Mask:255.128.0.0
>> UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> RX packets:6596091 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:316165 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:560395541 (534.4 Mb)  TX bytes:34367848 (32.7 Mb)
>> Interrupt:16
>>
>>eth1  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:61
>> inet addr:1.128.21.134  Mask:255.128.0.0
>> UP RUNNING NOARP  MTU:1500  Metric:1
>> RX packets:5600487 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:4863441 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:6203028277 (5915.6 Mb)  TX bytes:566471561 (540.2 Mb)
>> Interrupt:25
>>
>>eth2  Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>> UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> RX packets:829064 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:181572 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:1000
>> RX bytes:61216408 (58.3 Mb)  TX bytes:19079579 (18.1 Mb)
>> Base address:0x2000 Memory:fea8-feaa
>>
>>eth2:2Link encap:Ethernet  HWaddr 00:0E:AB:01:58:62
>> inet addr:129.66.9.146  Bcast:129.66.9.255  Mask:255.255.255.0
>> UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>> Base address:0x2000 Memory:fea8-feaa
>>
>>loLink encap:Local Loopback
>> inet addr:127.0.0.1  Mask:255.0.0.0
>> UP LOOPBACK RUNNING  MTU:16436  Metric:1
>> RX packets:14259 errors:0 dropped:0 overruns:0 frame:0
>> TX packets:14259 errors:0 dropped:0 overruns:0 carrier:0
>> collisions:0 txqueuelen:0
>> RX bytes:879631 (859.0 Kb)  TX bytes:879631 (859.0 Kb)
>>
>>uahrcw@c344-6:~/mpi-benchmarks> ping c344-5
>>PING c344-5.x.asc.edu (1.128.21.133) 56(84) bytes of data.
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=1 ttl=64
>>time=0.067 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=2 ttl=64
>>time=0.037 ms
>>64 bytes from c344-5.x.asc.edu (1.128.21.133): icmp_seq=3 ttl=64
>>time=0.022 ms
>>
>>--- c344-5.x.asc.edu ping statistics ---
>>3 packets transmitted, 3 received, 0% packet loss, time 1999ms
>>rtt min/avg/max/mdev = 0.022/0.042/0.067/0.018 ms
>>
>>
>>
>>George Bosilca wrote:
>>
>>>I see only 2 possibilities:
>>>1. your trying to run Open MPI on nodes having multiple IP
>>>addresses.
>>>2. your nodes are behind firewalls and Open MPI is unable to pass through.
>>>
>>>Please check the FAQ on http://www.open-mpi.org/faq/ to find out the full
>>>answer to your question.
>>>
>>>  Thanks,
>>>george.
>>>
>>>On Thu, 16 Mar 2006, Charles Wright wrote:
>>>
>>>
>>>  
Hello,
  I'm just compiled open-mpi and tried to run my code which just
measures bandwidth from one node to another.   (Code compile fine and
runs under other mpi implementations)

When I did I got this.

uahrcw@c275-6:~/mpi-benchmarks> cat openmpitcp.o15380
c317-6
c317-5
[c317-5:24979] [0,0,2]-[0,0,0] mca_oob_tcp_peer_complete_connect:
connection failed (errno=110) - retrying (pid=24979)
[c317-5:24979] mca_oob