Re: [OMPI users] ORTE errors

2006-04-11 Thread Michael Kluskens

On Apr 10, 2006, at 6:31 PM, Ralph Castain wrote:

Was this the only output you received? If so, then it looks like  
your parent process never gets to spawn and bcast - you should have  
seen your write statements first, yes?


Ralph


I only listed the ORTE errors, I get the correct output, complete as  
follows:


parent:  0  of  1
parent: How many processes total?
2
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
parent: Calling MPI_BCAST with btest =  17 .  child =  3
child 0 of 1:  Parent 3
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276

child 0 of 1:  Receiving   17 from parent
Maximum user memory allocated: 0

Michael




Michael Kluskens wrote:
The ORTE errors again, these are new and different errors.  Tested  
as of  OpenMPI 1.1a1r9596.


[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276


This test was run using OpenMPI 1.1 built on OS X 10.4.6 with g95  
from 4/9/06.  Past experience was that the ORTE errors were  
independent of OS and compiler.  Attached sample codes generated  
these errors.  They use MPI_SPAWN and MPI_BCAST (most vendors  
MPI's can't run this test case).


[OMPI users] Problem running code with OpenMPI-1.0.1

2006-04-11 Thread Jeffrey B. Layton

Good morning,

  I'm trying to run one of the NAS Parallel Benchmarks (bt) with
OpenMPI-1.0.1 that was built with PGI 6.0. The code never
starts (at least I don't see any output) until I kill the code. Then
I get the following message:

[0,1,2][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with 
errno=113[0,1,4][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with
errno=113[0,1,8][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with errno=113mpirun: killing job...


Any ideas on this one?

Thanks!

Jeff


Re: [OMPI users] ORTE errors

2006-04-11 Thread Ralph Castain




Thanks Michael - we're looking into it and will get back to you shortly.

Ralph


Michael Kluskens wrote:

  On Apr 10, 2006, at 6:31 PM, Ralph Castain wrote:

  
  
Was this the only output you received? If so, then it looks like  
your parent process never gets to spawn and bcast - you should have  
seen your write statements first, yes?

Ralph

  
  
I only listed the ORTE errors, I get the correct output, complete as  
follows:

parent:  0  of  1
parent: How many processes total?
2
parent: Calling MPI_Comm_spawn to start  1  subprocesses.
parent: Calling MPI_BCAST with btest =  17 .  child =  3
child 0 of 1:  Parent 3
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:00258] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276
child 0 of 1:  Receiving   17 from parent
Maximum user memory allocated: 0

Michael


  
  
Michael Kluskens wrote:


  The ORTE errors again, these are new and different errors.  Tested  
as of  OpenMPI 1.1a1r9596.

[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
soh_base_get_proc_soh.c at line 80
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
oob_base_xcast.c at line 108
[host:10198] [0,0,0] ORTE_ERROR_LOG: Not found in file base/ 
rmgr_base_stage_gate.c at line 276

This test was run using OpenMPI 1.1 built on OS X 10.4.6 with g95  
from 4/9/06.  Past experience was that the ORTE errors were  
independent of OS and compiler.  Attached sample codes generated  
these errors.  They use MPI_SPAWN and MPI_BCAST (most vendors  
MPI's can't run this test case).
  

  
  ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  





Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread David Daniel
I suspect that to get this to work for bproc, then we will have to  
build mpirun as 64-bit and the library as 32-bit.  That's because a  
32-bit compiled mpirun calls functions in the 32-bit /usr/lib/ 
libbroc.so which don't appear to function when the system is booted  
64-bit.


Of course that would mean we need heterogeneous support to run on a  
single homogeneous system!  Will this work on the 1.0 branch?


An alternative worth thinking about is to bypass the library calls  
and start processes using a system() call to invoke the bpsh  
command.  This is a 64-bit executable linked with /usr/lib64/ 
libbproc.so and which successfully launches both 32- and 64-bit  
executables.


I'm currently trying to solve the same issue for LA-MPI :(

David


On Apr 10, 2006, at 9:18 AM, Brian Barrett wrote:


On Apr 10, 2006, at 11:07 AM, David Gunter wrote:


(flashc 105%) mpiexec -n 4 ./send4
[flashc.lanl.gov:09921] mca: base: component_find: unable to open: /
lib/libc.so.6: version `GLIBC_2.3.4' not found (required by /net/
scratch1/dog/flash64/openmpi/openmpi-1.0.2-32b/lib/openmpi/
mca_paffinity_linux.so) (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
mpiexec: relocation error: /net/scratch1/dog/flash64/openmpi/
openmpi-1.0.2-32b/lib/openmpi/mca_soh_bproc.so: undefined symbol:
bproc_nodelist

The problem now looks like /lib/libc.so.6 is not longer available.
Indeed, it is available on the compiler nodes but it cannot be found
on the backend nodes - whoops!


Well, that's interesting.  Is this on a bproc platform?  If so, you
might be best off configuring with either --enable-static or --
disable-dlopen.  Either one will prevent components from loading,
which doesn't seem to always work well.

Also, it looks like at least one of the components has a different
libc its linked against than the others.  This makes me think that
perhaps you have some old components from a previous build in your
tree.  You might want to completely remove your installation prefix
(or lib/openmpi in your installation prefix) and run make install  
again.


Are you no longer seeing the errors about epoll?


The other problem is that the -m32 flag didn't make it into mpicc for
some reason.


This is expected behavior.  There are going to be more and more cases
where Open MPI provides one wrapper compiler that does the right
thing whether the user passes in -m32 / -m64 (or any of the vendor
options for doing the same thing).  So it will be increasingly
impossible for us to know what to add (but as Jeff said, you can tell
configure to always add -m32 to the wrapper compilers if you want).

Brian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread Ralph Castain




Heterogeneous operations are not supported on 1.0 - they are, however,
on the new 1.1.  :-) 

Also, remember that you must configure for static operation for bproc -
use the configuration options "--enable-static --disable-shared". Our
current bproc launcher *really* dislikes shared libraries  ;-) 

Ralph


David Daniel wrote:

  I suspect that to get this to work for bproc, then we will have to  
build mpirun as 64-bit and the library as 32-bit.  That's because a  
32-bit compiled mpirun calls functions in the 32-bit /usr/lib/ 
libbroc.so which don't appear to function when the system is booted  
64-bit.

Of course that would mean we need heterogeneous support to run on a  
single homogeneous system!  Will this work on the 1.0 branch?

An alternative worth thinking about is to bypass the library calls  
and start processes using a system() call to invoke the bpsh  
command.  This is a 64-bit executable linked with /usr/lib64/ 
libbproc.so and which successfully launches both 32- and 64-bit  
executables.

I'm currently trying to solve the same issue for LA-MPI :(

David


On Apr 10, 2006, at 9:18 AM, Brian Barrett wrote:

  
  
On Apr 10, 2006, at 11:07 AM, David Gunter wrote:



  (flashc 105%) mpiexec -n 4 ./send4
[flashc.lanl.gov:09921] mca: base: component_find: unable to open: /
lib/libc.so.6: version `GLIBC_2.3.4' not found (required by /net/
scratch1/dog/flash64/openmpi/openmpi-1.0.2-32b/lib/openmpi/
mca_paffinity_linux.so) (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
[flashc.lanl.gov:09921] mca: base: component_find: unable to open:
libbproc.so.4: cannot open shared object file: No such file or
directory (ignored)
mpiexec: relocation error: /net/scratch1/dog/flash64/openmpi/
openmpi-1.0.2-32b/lib/openmpi/mca_soh_bproc.so: undefined symbol:
bproc_nodelist

The problem now looks like /lib/libc.so.6 is not longer available.
Indeed, it is available on the compiler nodes but it cannot be found
on the backend nodes - whoops!
  

Well, that's interesting.  Is this on a bproc platform?  If so, you
might be best off configuring with either --enable-static or --
disable-dlopen.  Either one will prevent components from loading,
which doesn't seem to always work well.

Also, it looks like at least one of the components has a different
libc its linked against than the others.  This makes me think that
perhaps you have some old components from a previous build in your
tree.  You might want to completely remove your installation prefix
(or lib/openmpi in your installation prefix) and run make install  
again.

Are you no longer seeing the errors about epoll?



  The other problem is that the -m32 flag didn't make it into mpicc for
some reason.
  

This is expected behavior.  There are going to be more and more cases
where Open MPI provides one wrapper compiler that does the right
thing whether the user passes in -m32 / -m64 (or any of the vendor
options for doing the same thing).  So it will be increasingly
impossible for us to know what to add (but as Jeff said, you can tell
configure to always add -m32 to the wrapper compilers if you want).

Brian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  
  
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  





Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread David Gunter
Unfortunately static-only will create binaries that will overwhelm  
our machines.  This is not a realistic option.


-david

On Apr 11, 2006, at 1:04 PM, Ralph Castain wrote:

Also, remember that you must configure for static operation for  
bproc - use the configuration options "--enable-static --disable- 
shared". Our current bproc launcher *really* dislikes shared  
libraries ;-)




[OMPI users] Intel EM64T Compiler error on Opteron

2006-04-11 Thread Hugh Merz

I am trying to build OpenMPI v1.0.2 (stable) on an Opteron using the v8.1 Intel 
EM64T compilers:

Intel(R) C Compiler for Intel(R) EM64T-based applications, Version 8.1 Build 
20041123 Package ID: l_cce_pc_8.1.024
Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 8.1 
Build 20041123 Package ID: l_fce_pc_8.1.024

The compiler core dumps during make with:

 icc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include 
-DOMPI_PKGDATADIR=\"/scratch/merz//share/openmpi\" -I../../include -I../.. 
-I../.. -I../../include -I../../opal -I../../orte -I../../ompi -O3 -DNDEBUG 
-fno-strict-aliasing -pthread -MT cmd_line.lo -MD -MP -MF .deps/cmd_line.Tpo -c 
cmd_line.c  -fPIC -DPIC -o .libs/cmd_line.o
icc: error: /opt/intel_cce_80/bin/mcpcom: core dumped
icc: error: Fatal error in /opt/intel_cce_80/bin/mcpcom, terminated by unknown 
signal(139)

I couldn't find any other threads in the mailing list concerning usage of the 
Intel EM64T compilers - has anyone successfully compiled OpenMPI using this 
combination?  It also occurs on the Athlon 64 processor.  Logs attached.

Thanks,

Hugh

openmpi_1.0.2_logs.tar.bz2
Description: BZip2 compressed data


Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread Ralph Castain




Unfortunately, that's all that is available at the moment. Future
releases (post 1.1) may get around this problem.

The issue is that the bproc launcher actually does a binary memory
image of the process, then replicates that across all the nodes. This
is how we were told to implement it originally by the BProc folks.
However, that means that shared libraries have problems, for obvious
reasons.

We have to reimplement the bproc launcher using a different approach -
will take a little time.

Ralph


David Gunter wrote:

  Unfortunately static-only will create binaries that will overwhelm  
our machines.  This is not a realistic option.

-david

On Apr 11, 2006, at 1:04 PM, Ralph Castain wrote:

  
  
Also, remember that you must configure for static operation for  
bproc - use the configuration options "--enable-static --disable- 
shared". Our current bproc launcher *really* dislikes shared  
libraries ;-)

  
  
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  





Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread Tim S. Woodall

Ralph/all,

Ralph Castain wrote:
Unfortunately, that's all that is available at the moment. Future 
releases (post 1.1) may get around this problem.


The issue is that the bproc launcher actually does a binary memory image 
of the process, then replicates that across all the nodes. This is how 
we were told to implement it originally by the BProc folks. However, 
that means that shared libraries have problems, for obvious reasons.


We have to reimplement the bproc launcher using a different approach - 
will take a little time.




The current launcher does work w/ shared libraries, if they are available
on the backend nodes. So, it's more convienent if they are linked statically,
but not a requirement.

Tim



Ralph


David Gunter wrote:

Unfortunately static-only will create binaries that will overwhelm  
our machines.  This is not a realistic option.


-david

On Apr 11, 2006, at 1:04 PM, Ralph Castain wrote:

 

Also, remember that you must configure for static operation for  
bproc - use the configuration options "--enable-static --disable- 
shared". Our current bproc launcher *really* dislikes shared  
libraries ;-)
   



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

 





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread David Gunter

Thanks Ralph.

Was there a reason this functionality wasn't in from the start then?   
LA-MPI works under bproc using shared libraries.


I know Bproc folks like to kill the notion of shared libs but they  
are a fact of life we can't live without.


Just my $0.02.

-david

On Apr 11, 2006, at 1:24 PM, Ralph Castain wrote:

Unfortunately, that's all that is available at the moment. Future  
releases (post 1.1) may get around this problem.


The issue is that the bproc launcher actually does a binary memory  
image of the process, then replicates that across all the nodes.  
This is how we were told to implement it originally by the BProc  
folks. However, that means that shared libraries have problems, for  
obvious reasons.


We have to reimplement the bproc launcher using a different  
approach - will take a little time.


Ralph


David Gunter wrote:

Unfortunately static-only will create binaries that will overwhelm
our machines.  This is not a realistic option.

-david

On Apr 11, 2006, at 1:04 PM, Ralph Castain wrote:



Also, remember that you must configure for static operation for
bproc - use the configuration options "--enable-static --disable-
shared". Our current bproc launcher *really* dislikes shared
libraries ;-)


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Building 32-bit OpenMPI package for 64-bit Opteron platform

2006-04-11 Thread Ralph Castain




Nothing nefarious - just some bad advice. Fortunately, as my other note
indicated, Tim and company already fixed this by revising the launcher.

Sorry for the confusion
Ralph



David Gunter wrote:

  Thanks Ralph.
  
  
Was there a reason this functionality wasn't in from the start then? 
LA-MPI works under bproc using shared libraries. 
  
  
  I know Bproc folks like to kill the notion of shared libs but
they are a fact of life we can't live without.
  
  
  Just my $0.02.
  
  
  -david
  
  
  On Apr 11, 2006, at 1:24 PM, Ralph Castain wrote:
  
   Unfortunately, that's all that is available
at the moment. Future releases (post 1.1) may get around this problem.

The issue is that the bproc launcher actually does a binary memory
image of the process, then replicates that across all the nodes. This
is how we were told to implement it originally by the BProc folks.
However, that means that shared libraries have problems, for obvious
reasons.

We have to reimplement the bproc launcher using a different approach -
will take a little time.

Ralph


David Gunter wrote:

  Unfortunately static-only will create binaries that will overwhelm  
our machines.  This is not a realistic option.

-david

On Apr 11, 2006, at 1:04 PM, Ralph Castain wrote:

  
  
Also, remember that you must configure for static operation for  
bproc - use the configuration options "--enable-static --disable- 
shared". Our current bproc launcher *really* dislikes shared  
libraries ;-)

  
  ___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  
  
  
  
  

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





Re: [OMPI users] Intel EM64T Compiler error on Opteron

2006-04-11 Thread Troy Telford
On Tue, 11 Apr 2006 13:19:43 -0600, Hugh Merz   
wrote:


I couldn't find any other threads in the mailing list concerning usage  
of the Intel EM64T compilers - has anyone successfully compiled OpenMPI  
using this combination?  It also occurs on the Athlon 64 processor.   
Logs attached.


Thanks,

Hugh


I have compiled Open MPI (on an Opteron) with the Intel 9 EM64T compilers;  
It's been a while since I've used the 8.1 series, but I'll give it a shot  
with Intel 8.1 and tell you what happens.


Re: [OMPI users] Intel EM64T Compiler error on Opteron

2006-04-11 Thread Troy Telford
On Tue, 11 Apr 2006 13:48:43 -0600, Troy Telford  
 wrote:


I have compiled Open MPI (on an Opteron) with the Intel 9 EM64T  
compilers;

It's been a while since I've used the 8.1 series, but I'll give it a shot
with Intel 8.1 and tell you what happens.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


I can confirm that I'm able to compile Open MPI 1.0.2 on my systems.

Other info:
* Opteron 244 CPUs
* SLES 9 SP3 x86_64
* Intel(R) C Compiler for Intel(R) EM64T-based applications, Version  
8.1Build 20050628
* Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version  
8.1Build 20050517

--
Troy Telford
Linux Networx
ttelf...@linuxnetworx.com
(801) 649-1356


Re: [OMPI users] Problem running code with OpenMPI-1.0.1

2006-04-11 Thread Jeff Squyres (jsquyres)
Do you, perchance, have multiple TCP interfaces on at least one of the
nodes you're running on?

We had a mistake in the TCP network matching code during startup -- this
is fixed in v1.0.2.  Can you give that a whirl?


> -Original Message-
> From: users-boun...@open-mpi.org 
> [mailto:users-boun...@open-mpi.org] On Behalf Of Jeffrey B. Layton
> Sent: Tuesday, April 11, 2006 11:25 AM
> To: Open MPI Users
> Subject: [OMPI users] Problem running code with OpenMPI-1.0.1
> 
> Good morning,
> 
>I'm trying to run one of the NAS Parallel Benchmarks (bt) with
> OpenMPI-1.0.1 that was built with PGI 6.0. The code never
> starts (at least I don't see any output) until I kill the code. Then
> I get the following message:
> 
> [0,1,2][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] 
> connect() failed with 
> errno=113[0,1,4][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_c
> omplete_connect] 
> connect() failed with
> errno=113[0,1,8][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_c
> omplete_connect] 
> connect() failed with errno=113mpirun: killing job...
> 
> Any ideas on this one?
> 
> Thanks!
> 
> Jeff
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



Re: [OMPI users] Problem running code with OpenMPI-1.0.1

2006-04-11 Thread Jeffrey B. Layton

Well, yes these nodes do have multiple TCP interfaces.
I'll give 1.0.2 a whirl :)

Thanks!

Jeff


Do you, perchance, have multiple TCP interfaces on at least one of the
nodes you're running on?

We had a mistake in the TCP network matching code during startup -- this
is fixed in v1.0.2.  Can you give that a whirl?


  

-Original Message-
From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On Behalf Of Jeffrey B. Layton

Sent: Tuesday, April 11, 2006 11:25 AM
To: Open MPI Users
Subject: [OMPI users] Problem running code with OpenMPI-1.0.1

Good morning,

   I'm trying to run one of the NAS Parallel Benchmarks (bt) with
OpenMPI-1.0.1 that was built with PGI 6.0. The code never
starts (at least I don't see any output) until I kill the code. Then
I get the following message:

[0,1,2][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_complete_connect] 
connect() failed with 
errno=113[0,1,4][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_c
omplete_connect] 
connect() failed with

errno=113[0,1,8][btl_tcp_endpoint.c:559:mca_btl_tcp_endpoint_c
omplete_connect] 
connect() failed with errno=113mpirun: killing job...


Any ideas on this one?

Thanks!

Jeff
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users