[OMPI users] SIGSEGV in mpiexec

2007-05-08 Thread Luis Kornblueh
Hi everybody,

we've got some problems on our cluster with openmpi versions 1.2 and 
upward.

The following setup does work: 

openmpi-1.2b3: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 

The following two setups give a SISEGV in mpiexec (stack see below)

openmpi-1.2:   SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 
openmpi-1.2.1: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 

All have been compiled with

export F77=pgf95
export FC=pgf95

./configure --prefix=/sw/sles9-x64/voltaire/openmpi-1.2b3-pgi \
--enable-pretty-print-stacktrace \
--with-libnuma=/usr \
--with-mvapi=/usr \
--with-mvapi-libdir=/usr/lib64

(with changing prefix, of course)

The stack trace:

Starting program: 
/scratch/work/system/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/bin/mpiexec -host 
tornado1 --prefix=$MPIROOT -v -np 8 `pwd`/osu_bw
[Thread debugging using libthread_db enabled]
[New Thread 182906198784 (LWP 30805)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 182906198784 (LWP 30805)]
0x002a957f1b5b in _int_free () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
(gdb) where
#0  0x002a957f1b5b in _int_free () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#1  0x002a957f1e7d in free () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#2  0x002a95563b72 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#3  0x002a95fb51ec in __libc_dl_error_tsd () from /lib64/tls/libc.so.6
#4  0x002a95dba6ec in __pthread_initialize_minimal_internal () from 
/lib64/tls/libpthread.so.0
#5  0x002a95dba419 in call_initialize_minimal () from 
/lib64/tls/libpthread.so.0
#6  0x002a95ec9000 in ?? ()
#7  0x002a95db9fe9 in _init () from /lib64/tls/libpthread.so.0
#8  0x007fbfffe7c0 in ?? ()
#9  0x002a9556168d in call_init () from /lib64/ld-linux-x86-64.so.2
#10 0x002a9556179b in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#11 0x002a95fb39ac in dl_open_worker () from /lib64/tls/libc.so.6
#12 0x002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#13 0x002a95fb3160 in _dl_open () from /lib64/tls/libc.so.6
#14 0x002a959413b5 in dlopen_doit () from /lib64/libdl.so.2
#15 0x002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#16 0x002a959416fa in _dlerror_run () from /lib64/libdl.so.2
#17 0x002a95941362 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#18 0x002a957db2ee in vm_open () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#19 0x002a957d9645 in tryall_dlopen () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#20 0x002a957d981e in tryall_dlopen_module () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#21 0x002a957daab1 in try_dlopen () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#22 0x002a957dacd6 in lt_dlopenext () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#23 0x002a957e04f5 in open_component () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#24 0x002a957e0f60 in mca_base_component_find () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#25 0x002a957e189c in mca_base_components_open () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0
#26 0x002a956a6119 in orte_rds_base_open () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#27 0x002a95681d18 in orte_init_stage1 () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#28 0x002a95684eba in orte_system_init () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#29 0x002a9568179d in orte_init () from 
/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0
#30 0x00402a3a in orterun (argc=8, argv=0x7fbfffe778) at orterun.c:374
#31 0x004028d3 in main (argc=8, argv=0x7fbfffe778) at main.c:13
(gdb) quit

In case access to our cluster could help, we would be happy to 
provide an account.

Cheerio,
Luis
-- 
 \\
 (-0^0-)
--oOO--(_)--OOo-

 Luis Kornblueh   Tel. : +49-40-41173289
 Max-Planck-Institute for Meteorology Fax. : +49-40-41173298
 Bundesstr. 53  
 D-20146 Hamburg   Email: luis.kornbl...@zmaw.de
 Federal Republic of Germany   


Re: [OMPI users] openMPI over uDAPL doesn't work

2007-05-08 Thread Boris Bierbaum
Hi,

we (my collegue Andreas and me) are still trying to solve this problem.
I have compiled some additional information, maybe somebody has an idea
about what's going on.

OS: Debian GNU/Linux 4.0, Kernel 2.6.18, x86, 32-Bit
IB software: OFED 1.1
SM: OpenSM from OFED 1.1
uDAPL: DAPL reference implementation version gamma 3.02 (using DAPL from
OFED 1.1 doesn't change anything, I suppose it's the same code, at least
roughly)
Test program: Intel MPI Benchmarks Version 2.3
OpenMPI version: 1.2.1

Running OpenMPI directly over IB verbs (mpirun --mca btl self,sm,openib
...) works. Here's the output of ibv_devinfo and ifconfig for the two
nodes on which tried to run the benchmark (ulimit -l is unlimited on
both machines):

 1st node ---

boris@pd-04:/work/boris/IMB_2.3/src$ /opt/infiniband/bin/ibv_devinfo
hca_id: mthca0
fw_ver: 1.2.0
node_guid:  0002:c902:0020:b528
sys_image_guid: 0002:c902:0020:b52b
vendor_id:  0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id:   MT_023001
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   9
port_lmc:   0x00

boris@pd-04:/work/boris/IMB_2.3/src$ /sbin/ifconfig

...

ib0   Protokoll:UNSPEC  Hardware Adresse
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet Adresse:192.168.0.14  Bcast:192.168.0.255
Maske:255.255.255.0
  inet6 Adresse: fe80::202:c902:20:b529/64
Gültigkeitsbereich:Verbindung
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:67 errors:0 dropped:0 overruns:0 frame:0
  TX packets:16 errors:0 dropped:2 overruns:0 carrier:0
  Kollisionen:0 Sendewarteschlangenlänge:128
  RX bytes:3752 (3.6 KiB)  TX bytes:968 (968.0 b)

...

 2nd node ---

boris@pd-05:~$  /opt/infiniband/bin/ibv_devinfo
hca_id: mthca0
fw_ver: 1.2.0
node_guid:  0002:c902:0020:b4f4
sys_image_guid: 0002:c902:0020:b4f7
vendor_id:  0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id:   MT_023001
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   10
port_lmc:   0x00

boris@pd-05:~$ /sbin/ifconfig

...

ib0   Protokoll:UNSPEC  Hardware Adresse
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet Adresse:192.168.0.15  Bcast:192.168.0.255
Maske:255.255.255.0
  inet6 Adresse: fe80::202:c902:20:b4f5/64
Gültigkeitsbereich:Verbindung
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:67 errors:0 dropped:0 overruns:0 frame:0
  TX packets:18 errors:0 dropped:2 overruns:0 carrier:0
  Kollisionen:0 Sendewarteschlangenlänge:128
  RX bytes:3752 (3.6 KiB)  TX bytes:1088 (1.0 KiB)


...

-


Here's the output from the failed run, with every DAT and DAPL debug
output enabled:



boris@pd-04:/work/boris/IMB_2.3/src$ mpirun -np 2 -x DAT_DBG_TYPE -x
DAPL_DBG_TYPE -x DAT_OVERRIDE --mca btl self,sm,udapl --host pd-04,pd-05
/work/boris/IMB_2.3/src/IMB-MPI1 pingpong
DAT Registry: Started (dat_init)
DAT Registry: static registry file


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value



DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value <>


DAT Registry: token
 type  eor
 value <>


DAT Registry: entry
 ia_name OpenIB-cma
 api_version
 type 0x0
 major.minor 1.2
 is_thread_safe 0
 is_default 1
 lib_path
/home/boris/dapl_on_dope_gamma3.2/dapl/udapl/Target/i686/libdapl_openib_cma.so
 provider_version
 id mv_dapl
 major.minor 1.2
 ia_params ib0 0

DAT Registry: loading provider for OpenIB-cma

DAT Registry: token
 type  eof
 value <>

DAT Registry: dat_registry_list_providers () called
DAT Registry: dat_ia_openv (OpenIB-cma,1:2,0) called
D

[OMPI users] AlphaServers & OpenMPI

2007-05-08 Thread Rob

Hi,

What is the problem with supporting AlphaServers in
OpenMPI?

The alternatives, MPICH1 (very old) supports
AlphaServers; and MPICH2 (new) appears to work on
AlphaServers too (but setting up MPICH2 with the
mpd ring is just too complicated).

Hence, I would prefer OpenMPI instead.
Is there a way to get OpenMPI work on my AlphaSystems?

Thanks,
Rob.





8:00? 8:25? 8:40? Find a flick in no time 
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news


Re: [OMPI users] mpirun: "-wd" depreciated?

2007-05-08 Thread Jeff Squyres

Oops -- looks like a typo in the man page.  The real flag is "-wdir".

Let me see how we want to fix that: I'm not sure if there's an OMPI  
member who wants to have "-wd" for backward compatibility.  I'm  
guessing that we'll either:


1. s/-wd/-wdir/g in the man page
2. Add the flag "-wd" which will be a synonym for "-wdir"

Thanks for bringing it to our attention!



On May 7, 2007, at 12:04 AM, Rob wrote:



Hi,

In the man page of mpirun it says:

  -wd  Change  to the directory  before
the user's program executes


When I do a 'mpirun --help', there's no mentioning
of the -wd flag. Also, when I try using this flag,
I get errors without mpi executing anything.

So what about this -wd flag?

Rob.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Alpha system & OpenMPI 1.2.1 does not work...

2007-05-08 Thread Jeff Squyres

On May 1, 2007, at 11:28 PM, Rob wrote:


I'm now trying the nightly build from SVN
(version 1.3a1r14551), but I'm afraid that Alpha
support is still not there.if that's the case,
is there any chance to fix openmpi for Alpha?


Indeed this fails with the same error as the
compilation of 1.2.1 with "--enable-static".
Output files of this 1.3/SVN are at
  http://www.lahaye.dds.nl/openmpi/


I tried to go here and got a 404 (probably because we took so long to  
reply -- sorry...).  Can you re-post these files?



My OS is CentOS 4.4
(the equivalent of RedHat Enterprise Edition 4).
Hence, my packages are not so up-to-date versions:

autoconf-2.59-5
automake-1.9.2-3
libtool-1.5.6-4.EL4.1.c4.2
libtool-libs-1.5.6-4.EL4.1.c4.2
flex-2.5.4a-33
(what else is essential to build OpenMpi?)


By the way, I don't think the above packages are
required for building OpenMPI from the 1.2.1
source tarball, or are they?


Correct.  The OMPI downloadable tarballs (including the nightly  
snapshots) are self-contained; you don't need the above-listed tools  
to compile them.  Those tools are really only necessary for developer  
builds of Open MPI (e.g., a Subversion checkout).


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Alpha system & OpenMPI 1.2.1 does not work...

2007-05-08 Thread Jeff Squyres

On May 1, 2007, at 9:11 PM, Rob wrote:


A few emails back I reported that I could build
openmpi on Alpha system (except the static libraries).
However, it seems that the built result is unusable.
With every simple program (even non-mpi) I compile,
I get:

  $ mpicc myprog.c --showme:version
 mpicc: Open MPI 1.2.1 (Language: C)

  $ mpicc myprog.c
 gcc: dummy: No such file or directory
 gcc: ranlib: No such file or directory

  $ mpicc myprog.c --showme
 /opt/gcc/bin/gcc -I/opt/openmpi/include/openmpi
 -I/opt/openmpi/include -pthread -mfp-trap-mode=su
 myprog.c -L/opt/openmpi/lib -lmpi -lopen-rte
 -lopen-pal -ldl dummy ranlib

(Note: the "-mfp-trap-mode=su" prevents a runtime
SIGSEGV crash with GNU compiler on Alpha system)

  $ mpicc myprog.c --showme:link
 -pthread -mfp-trap-mode=su myprog.c
 -L/opt/openmpi/lib -lmpi -lopen-rte -lopen-pal
 -ldl dummy ranlib

What is the "dummy" and "ranlib" doing here?


This specific problem may be due to a bug that Brian just found/fixed  
in the configure script last night (due to a bug report from Paul Van  
Allsburg).  Could you try any nightly trunk tarball after r14600 (the  
fix hasn't made its way over to the 1.2 release branch yet; I assume  
it will soon)?



I'm now trying the nightly build from SVN
(version 1.3a1r14551), but I'm afraid that Alpha
support is still not there.if that's the case,
is there any chance to fix openmpi for Alpha?


So I think you're having 2 issues (right?):

1. The opal missing symbol when you compile dynamically
2. The dummy/ranlib arguments in mpicc and friends

#2 may be fixed; #1 will require a closer look (per my previous mail).


My OS is CentOS 4.4
(the equivalent of RedHat Enterprise Edition 4).
Hence, my packages are not so up-to-date versions:

autoconf-2.59-5
automake15-1.5-13
automake-1.9.2-3
automake14-1.4p6-12
automake17-1.7.9-5
automake16-1.6.3-5
libtool-1.5.6-4.EL4.1.c4.2
libtool-libs-1.5.6-4.EL4.1.c4.2
flex-2.5.4a-33
(what else is essential to build OpenMpi?)


Building from SVN will require more recent versions of these tools  
(libtool in particular) -- see: http://www.open-mpi.org/svn/ 
building.php.  The HACKING file has good instructions on how to get  
recent versions of the tools without hosing your system: http:// 
svn.open-mpi.org/svn/ompi/trunk/HACKING.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] AlphaServers & OpenMPI

2007-05-08 Thread Jeff Squyres
Sorry for the delay in replying -- per the other thread, let's see if  
the mpicc problem was fixed last night, and let's see the configure  
output files to try to get an idea about what the problem was in  
regards to the opal missing symbol.


To be honest, however, none of the current Open MPI members support  
the Alpha platform.  Proper development and maintenance may therefore  
be somewhat difficult (indeed, I have no customers who use Alpha, so  
it's hard for me to justify spending time on Alpha-specific issues).


That being said, Open MPI is an open source project and we welcome  
the contributions of others!  :-)



On May 8, 2007, at 6:05 AM, Rob wrote:



Hi,

What is the problem with supporting AlphaServers in
OpenMPI?

The alternatives, MPICH1 (very old) supports
AlphaServers; and MPICH2 (new) appears to work on
AlphaServers too (but setting up MPICH2 with the
mpd ring is just too complicated).

Hence, I would prefer OpenMPI instead.
Is there a way to get OpenMPI work on my AlphaSystems?

Thanks,
Rob.




__ 
__

8:00? 8:25? 8:40? Find a flick in no time
with the Yahoo! Search movie showtime shortcut.
http://tools.search.yahoo.com/shortcuts/#news
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] openMPI over uDAPL doesn't work

2007-05-08 Thread Jeff Squyres
I'm forwarding this to the OpenFabrics general list -- as it just  
came up the other day, we know that Open MPI's UDAPL support works on  
Solaris, but we have done little/no testing of it on OFED (I  
personally know almost nothing about UDPAL).


Can the UDAPL OFED wizards shed any light on the error messages that  
are listed below?  In particular, these seem to be worrysome:



 setup_listener Permission denied
 setup_listener Address already in use

and

 create_qp Address already in use


Thanks...


On May 8, 2007, at 5:37 AM, Boris Bierbaum wrote:


Hi,

we (my collegue Andreas and me) are still trying to solve this  
problem.
I have compiled some additional information, maybe somebody has an  
idea

about what's going on.

OS: Debian GNU/Linux 4.0, Kernel 2.6.18, x86, 32-Bit
IB software: OFED 1.1
SM: OpenSM from OFED 1.1
uDAPL: DAPL reference implementation version gamma 3.02 (using DAPL  
from
OFED 1.1 doesn't change anything, I suppose it's the same code, at  
least

roughly)
Test program: Intel MPI Benchmarks Version 2.3
OpenMPI version: 1.2.1

Running OpenMPI directly over IB verbs (mpirun --mca btl  
self,sm,openib

...) works. Here's the output of ibv_devinfo and ifconfig for the two
nodes on which tried to run the benchmark (ulimit -l is unlimited on
both machines):

 1st node ---

boris@pd-04:/work/boris/IMB_2.3/src$ /opt/infiniband/bin/ibv_devinfo
hca_id: mthca0
fw_ver: 1.2.0
node_guid:  0002:c902:0020:b528
sys_image_guid: 0002:c902:0020:b52b
vendor_id:  0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id:   MT_023001
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   9
port_lmc:   0x00

boris@pd-04:/work/boris/IMB_2.3/src$ /sbin/ifconfig

...

ib0   Protokoll:UNSPEC  Hardware Adresse
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet Adresse:192.168.0.14  Bcast:192.168.0.255
Maske:255.255.255.0
  inet6 Adresse: fe80::202:c902:20:b529/64
Gültigkeitsbereich:Verbindung
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:67 errors:0 dropped:0 overruns:0 frame:0
  TX packets:16 errors:0 dropped:2 overruns:0 carrier:0
  Kollisionen:0 Sendewarteschlangenlänge:128
  RX bytes:3752 (3.6 KiB)  TX bytes:968 (968.0 b)

...

 2nd node ---

boris@pd-05:~$  /opt/infiniband/bin/ibv_devinfo
hca_id: mthca0
fw_ver: 1.2.0
node_guid:  0002:c902:0020:b4f4
sys_image_guid: 0002:c902:0020:b4f7
vendor_id:  0x02c9
vendor_part_id: 25204
hw_ver: 0xA0
board_id:   MT_023001
phys_port_cnt:  1
port:   1
state:  PORT_ACTIVE (4)
max_mtu:2048 (4)
active_mtu: 2048 (4)
sm_lid: 1
port_lid:   10
port_lmc:   0x00

boris@pd-05:~$ /sbin/ifconfig

...

ib0   Protokoll:UNSPEC  Hardware Adresse
00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
  inet Adresse:192.168.0.15  Bcast:192.168.0.255
Maske:255.255.255.0
  inet6 Adresse: fe80::202:c902:20:b4f5/64
Gültigkeitsbereich:Verbindung
  UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
  RX packets:67 errors:0 dropped:0 overruns:0 frame:0
  TX packets:18 errors:0 dropped:2 overruns:0 carrier:0
  Kollisionen:0 Sendewarteschlangenlänge:128
  RX bytes:3752 (3.6 KiB)  TX bytes:1088 (1.0 KiB)


...

-- 
---



Here's the output from the failed run, with every DAT and DAPL debug
output enabled:



boris@pd-04:/work/boris/IMB_2.3/src$ mpirun -np 2 -x DAT_DBG_TYPE -x
DAPL_DBG_TYPE -x DAT_OVERRIDE --mca btl self,sm,udapl --host  
pd-04,pd-05

/work/boris/IMB_2.3/src/IMB-MPI1 pingpong
DAT Registry: Started (dat_init)
DAT Registry: static registry file


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value 


DAT Registry: token
 type  string
 value
libdapl_openib_cma.so>



DAT Registry: token
 type  string
 value 


DAT

[OMPI users] Fwd: [ofa-general] Re: openMPI over uDAPL doesn't work

2007-05-08 Thread Jeff Squyres
Re-forwarding to OMPI list; because of the OMPI list anti-spam  
checks, Arlin's post didn't make it through to our list when he  
originally posted.




Begin forwarded message:


From: Arlin Davis 
Date: May 8, 2007 3:09:02 PM EDT
To: Jeff Squyres 
Cc: Open MPI Users , OpenFabrics General  

Subject: Re: [ofa-general] Re: [OMPI users] openMPI over uDAPL  
doesn't work


Jeff Squyres wrote:

I'm forwarding this to the OpenFabrics general list -- as it just   
came up the other day, we know that Open MPI's UDAPL support works  
on  Solaris, but we have done little/no testing of it on OFED (I   
personally know almost nothing about UDPAL).


Can the UDAPL OFED wizards shed any light on the error messages  
that  are listed below?  In particular, these seem to be worrysome:



 setup_listener Permission denied


 setup_listener Address already in use


These failures are from rdma_cm_bind indicating the port is already  
bound to this IA address. How are you creating the service point?
dat_psp_create or dat_psp_create_any? If it is psp_create_any then  
you will see some failures until it  gets to a free port. That is  
normal. Just make sure your create call returns DAT_SUCCESS.



 create_qp Address already in use


This is a real problem with the bind, port is already in use. Not  
sure why this would fail since the current version of OFED uDAPL  
uses a wildcard port when binding and uses the address from the  
open;  I remember an issue a while back with rdma_cm and wildcard  
ports. What version of OFED are you using?


-arlin



--
Jeff Squyres
Cisco Systems



[OMPI users] Newbie question. Please help.

2007-05-08 Thread Steven Truong

Hi, all.  I am new to OpenMPI and after initial setup I tried to run
my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
on signal 60 (Real-time signal 26).

/usr/local/openmpi-1.2.1/bin/ompi_info
   Open MPI: 1.2.1
  Open MPI SVN revision: r14481
   Open RTE: 1.2.1
  Open RTE SVN revision: r14481
   OPAL: 1.2.1
  OPAL SVN revision: r14481
 Prefix: /usr/local/openmpi-1.2.1
Configured architecture: x86_64-unknown-linux-gnu
  Configured by: root
  Configured on: Mon May  7 18:32:56 PDT 2007
 Configure host: neptune.nanostellar.com
   Built by: root
   Built on: Mon May  7 18:40:28 PDT 2007
 Built host: neptune.my.com
 C bindings: yes
   C++ bindings: yes
 Fortran77 bindings: yes (all)
 Fortran90 bindings: yes
Fortran90 bindings size: small
 C compiler: gcc
C compiler absolute: /usr/bin/gcc
   C++ compiler: g++
  C++ compiler absolute: /usr/bin/g++
 Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
 Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
 Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
 Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
C profiling: yes
  C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
 C++ exceptions: no
 Thread support: posix (mpi: no, progress: no)
 Internal debug support: no
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
libltdl support: yes
  Heterogeneous support: yes
mpirun default --prefix: yes
  MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1)
 MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1)
  MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
  MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
  MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1)
  MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1)
MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1)
  MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
  MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
   MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
   MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
   MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
   MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
 MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
  MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1)
  MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
 MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
   MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
 MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
 MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
 MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
MCA iof: svc (MCA 

Re: [OMPI users] Newbie question. Please help.

2007-05-08 Thread Brock Palen

Steven,
We run vasp on both Linux (PGI compilers) and Max OSX (xlf)  I am sad  
to announce that VASP does not work with openMPI last I tried  
(1.1.1)  With the errors you reported are the same I saw.  VASP for  
the time (version 4)  Works only with Lam and MPICH-1.


If you have insight into the vasp devs that both those projects are  
un-maintined it would be a huge help for us all!


Again though if any Fortran guru has had time to find out how to make  
VASP work with OMPI Please contact us all right away!


Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985


On May 8, 2007, at 10:18 PM, Steven Truong wrote:


Hi, all.  I am new to OpenMPI and after initial setup I tried to run
my app but got the followign errors:

[node07.my.com:16673] *** An error occurred in MPI_Comm_rank
[node07.my.com:16673] *** on communicator MPI_COMM_WORLD
[node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16674] *** An error occurred in MPI_Comm_rank
[node07.my.com:16674] *** on communicator MPI_COMM_WORLD
[node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16675] *** An error occurred in MPI_Comm_rank
[node07.my.com:16675] *** on communicator MPI_COMM_WORLD
[node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye)
[node07.my.com:16676] *** An error occurred in MPI_Comm_rank
[node07.my.com:16676] *** on communicator MPI_COMM_WORLD
[node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator
[node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 2 with PID 16675 on node node07 exited
on signal 60 (Real-time signal 26).

 /usr/local/openmpi-1.2.1/bin/ompi_info
Open MPI: 1.2.1
   Open MPI SVN revision: r14481
Open RTE: 1.2.1
   Open RTE SVN revision: r14481
OPAL: 1.2.1
   OPAL SVN revision: r14481
  Prefix: /usr/local/openmpi-1.2.1
 Configured architecture: x86_64-unknown-linux-gnu
   Configured by: root
   Configured on: Mon May  7 18:32:56 PDT 2007
  Configure host: neptune.nanostellar.com
Built by: root
Built on: Mon May  7 18:40:28 PDT 2007
  Built host: neptune.my.com
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: yes (all)
  Fortran90 bindings: yes
 Fortran90 bindings size: small
  C compiler: gcc
 C compiler absolute: /usr/bin/gcc
C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
  Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort
  Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: yes
 Fortran90 profiling: yes
  C++ exceptions: no
  Thread support: posix (mpi: no, progress: no)
  Internal debug support: no
 MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
 libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
   MCA backtrace: execinfo (MCA v1.0, API v1.0, Component  
v1.2.1)
  MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component  
v1.2.1)

   MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1)
   MCA maffinity: first_use (MCA v1.0, API v1.0, Component  
v1.2.1)
   MCA maffinity: libnuma (MCA v1.0, API v1.0, Component  
v1.2.1)

   MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1)
 MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1)
 MCA installdirs: config (MCA v1.0, API v1.0, Component  
v1.2.1)

   MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
   MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
   MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1)
   MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
  MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
 MCA btl: self (MCA v1.0, API v1.0.1, Component  
v1.2.1)

 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)