[OMPI users] SIGSEGV in mpiexec
Hi everybody, we've got some problems on our cluster with openmpi versions 1.2 and upward. The following setup does work: openmpi-1.2b3: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 The following two setups give a SISEGV in mpiexec (stack see below) openmpi-1.2: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 openmpi-1.2.1: SLES 9 SP3 with gcc/g++ 4.1.1 and PGI f95 6.1-1 All have been compiled with export F77=pgf95 export FC=pgf95 ./configure --prefix=/sw/sles9-x64/voltaire/openmpi-1.2b3-pgi \ --enable-pretty-print-stacktrace \ --with-libnuma=/usr \ --with-mvapi=/usr \ --with-mvapi-libdir=/usr/lib64 (with changing prefix, of course) The stack trace: Starting program: /scratch/work/system/sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/bin/mpiexec -host tornado1 --prefix=$MPIROOT -v -np 8 `pwd`/osu_bw [Thread debugging using libthread_db enabled] [New Thread 182906198784 (LWP 30805)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 182906198784 (LWP 30805)] 0x002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 (gdb) where #0 0x002a957f1b5b in _int_free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #1 0x002a957f1e7d in free () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #2 0x002a95563b72 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2 #3 0x002a95fb51ec in __libc_dl_error_tsd () from /lib64/tls/libc.so.6 #4 0x002a95dba6ec in __pthread_initialize_minimal_internal () from /lib64/tls/libpthread.so.0 #5 0x002a95dba419 in call_initialize_minimal () from /lib64/tls/libpthread.so.0 #6 0x002a95ec9000 in ?? () #7 0x002a95db9fe9 in _init () from /lib64/tls/libpthread.so.0 #8 0x007fbfffe7c0 in ?? () #9 0x002a9556168d in call_init () from /lib64/ld-linux-x86-64.so.2 #10 0x002a9556179b in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2 #11 0x002a95fb39ac in dl_open_worker () from /lib64/tls/libc.so.6 #12 0x002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #13 0x002a95fb3160 in _dl_open () from /lib64/tls/libc.so.6 #14 0x002a959413b5 in dlopen_doit () from /lib64/libdl.so.2 #15 0x002a955612de in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2 #16 0x002a959416fa in _dlerror_run () from /lib64/libdl.so.2 #17 0x002a95941362 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2 #18 0x002a957db2ee in vm_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #19 0x002a957d9645 in tryall_dlopen () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #20 0x002a957d981e in tryall_dlopen_module () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #21 0x002a957daab1 in try_dlopen () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #22 0x002a957dacd6 in lt_dlopenext () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #23 0x002a957e04f5 in open_component () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #24 0x002a957e0f60 in mca_base_component_find () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #25 0x002a957e189c in mca_base_components_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-pal.so.0 #26 0x002a956a6119 in orte_rds_base_open () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0 #27 0x002a95681d18 in orte_init_stage1 () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0 #28 0x002a95684eba in orte_system_init () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0 #29 0x002a9568179d in orte_init () from /sw/sles9-x64/voltaire/openmpi-1.2.1-pgi/lib/libopen-rte.so.0 #30 0x00402a3a in orterun (argc=8, argv=0x7fbfffe778) at orterun.c:374 #31 0x004028d3 in main (argc=8, argv=0x7fbfffe778) at main.c:13 (gdb) quit In case access to our cluster could help, we would be happy to provide an account. Cheerio, Luis -- \\ (-0^0-) --oOO--(_)--OOo- Luis Kornblueh Tel. : +49-40-41173289 Max-Planck-Institute for Meteorology Fax. : +49-40-41173298 Bundesstr. 53 D-20146 Hamburg Email: luis.kornbl...@zmaw.de Federal Republic of Germany
Re: [OMPI users] openMPI over uDAPL doesn't work
Hi, we (my collegue Andreas and me) are still trying to solve this problem. I have compiled some additional information, maybe somebody has an idea about what's going on. OS: Debian GNU/Linux 4.0, Kernel 2.6.18, x86, 32-Bit IB software: OFED 1.1 SM: OpenSM from OFED 1.1 uDAPL: DAPL reference implementation version gamma 3.02 (using DAPL from OFED 1.1 doesn't change anything, I suppose it's the same code, at least roughly) Test program: Intel MPI Benchmarks Version 2.3 OpenMPI version: 1.2.1 Running OpenMPI directly over IB verbs (mpirun --mca btl self,sm,openib ...) works. Here's the output of ibv_devinfo and ifconfig for the two nodes on which tried to run the benchmark (ulimit -l is unlimited on both machines): 1st node --- boris@pd-04:/work/boris/IMB_2.3/src$ /opt/infiniband/bin/ibv_devinfo hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0020:b528 sys_image_guid: 0002:c902:0020:b52b vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_023001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 9 port_lmc: 0x00 boris@pd-04:/work/boris/IMB_2.3/src$ /sbin/ifconfig ... ib0 Protokoll:UNSPEC Hardware Adresse 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet Adresse:192.168.0.14 Bcast:192.168.0.255 Maske:255.255.255.0 inet6 Adresse: fe80::202:c902:20:b529/64 Gültigkeitsbereich:Verbindung UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:67 errors:0 dropped:0 overruns:0 frame:0 TX packets:16 errors:0 dropped:2 overruns:0 carrier:0 Kollisionen:0 Sendewarteschlangenlänge:128 RX bytes:3752 (3.6 KiB) TX bytes:968 (968.0 b) ... 2nd node --- boris@pd-05:~$ /opt/infiniband/bin/ibv_devinfo hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0020:b4f4 sys_image_guid: 0002:c902:0020:b4f7 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_023001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 10 port_lmc: 0x00 boris@pd-05:~$ /sbin/ifconfig ... ib0 Protokoll:UNSPEC Hardware Adresse 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet Adresse:192.168.0.15 Bcast:192.168.0.255 Maske:255.255.255.0 inet6 Adresse: fe80::202:c902:20:b4f5/64 Gültigkeitsbereich:Verbindung UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:67 errors:0 dropped:0 overruns:0 frame:0 TX packets:18 errors:0 dropped:2 overruns:0 carrier:0 Kollisionen:0 Sendewarteschlangenlänge:128 RX bytes:3752 (3.6 KiB) TX bytes:1088 (1.0 KiB) ... - Here's the output from the failed run, with every DAT and DAPL debug output enabled: boris@pd-04:/work/boris/IMB_2.3/src$ mpirun -np 2 -x DAT_DBG_TYPE -x DAPL_DBG_TYPE -x DAT_OVERRIDE --mca btl self,sm,udapl --host pd-04,pd-05 /work/boris/IMB_2.3/src/IMB-MPI1 pingpong DAT Registry: Started (dat_init) DAT Registry: static registry file DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value <> DAT Registry: token type eor value <> DAT Registry: entry ia_name OpenIB-cma api_version type 0x0 major.minor 1.2 is_thread_safe 0 is_default 1 lib_path /home/boris/dapl_on_dope_gamma3.2/dapl/udapl/Target/i686/libdapl_openib_cma.so provider_version id mv_dapl major.minor 1.2 ia_params ib0 0 DAT Registry: loading provider for OpenIB-cma DAT Registry: token type eof value <> DAT Registry: dat_registry_list_providers () called DAT Registry: dat_ia_openv (OpenIB-cma,1:2,0) called D
[OMPI users] AlphaServers & OpenMPI
Hi, What is the problem with supporting AlphaServers in OpenMPI? The alternatives, MPICH1 (very old) supports AlphaServers; and MPICH2 (new) appears to work on AlphaServers too (but setting up MPICH2 with the mpd ring is just too complicated). Hence, I would prefer OpenMPI instead. Is there a way to get OpenMPI work on my AlphaSystems? Thanks, Rob. 8:00? 8:25? 8:40? Find a flick in no time with the Yahoo! Search movie showtime shortcut. http://tools.search.yahoo.com/shortcuts/#news
Re: [OMPI users] mpirun: "-wd" depreciated?
Oops -- looks like a typo in the man page. The real flag is "-wdir". Let me see how we want to fix that: I'm not sure if there's an OMPI member who wants to have "-wd" for backward compatibility. I'm guessing that we'll either: 1. s/-wd/-wdir/g in the man page 2. Add the flag "-wd" which will be a synonym for "-wdir" Thanks for bringing it to our attention! On May 7, 2007, at 12:04 AM, Rob wrote: Hi, In the man page of mpirun it says: -wd Change to the directory before the user's program executes When I do a 'mpirun --help', there's no mentioning of the -wd flag. Also, when I try using this flag, I get errors without mpi executing anything. So what about this -wd flag? Rob. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Alpha system & OpenMPI 1.2.1 does not work...
On May 1, 2007, at 11:28 PM, Rob wrote: I'm now trying the nightly build from SVN (version 1.3a1r14551), but I'm afraid that Alpha support is still not there.if that's the case, is there any chance to fix openmpi for Alpha? Indeed this fails with the same error as the compilation of 1.2.1 with "--enable-static". Output files of this 1.3/SVN are at http://www.lahaye.dds.nl/openmpi/ I tried to go here and got a 404 (probably because we took so long to reply -- sorry...). Can you re-post these files? My OS is CentOS 4.4 (the equivalent of RedHat Enterprise Edition 4). Hence, my packages are not so up-to-date versions: autoconf-2.59-5 automake-1.9.2-3 libtool-1.5.6-4.EL4.1.c4.2 libtool-libs-1.5.6-4.EL4.1.c4.2 flex-2.5.4a-33 (what else is essential to build OpenMpi?) By the way, I don't think the above packages are required for building OpenMPI from the 1.2.1 source tarball, or are they? Correct. The OMPI downloadable tarballs (including the nightly snapshots) are self-contained; you don't need the above-listed tools to compile them. Those tools are really only necessary for developer builds of Open MPI (e.g., a Subversion checkout). -- Jeff Squyres Cisco Systems
Re: [OMPI users] Alpha system & OpenMPI 1.2.1 does not work...
On May 1, 2007, at 9:11 PM, Rob wrote: A few emails back I reported that I could build openmpi on Alpha system (except the static libraries). However, it seems that the built result is unusable. With every simple program (even non-mpi) I compile, I get: $ mpicc myprog.c --showme:version mpicc: Open MPI 1.2.1 (Language: C) $ mpicc myprog.c gcc: dummy: No such file or directory gcc: ranlib: No such file or directory $ mpicc myprog.c --showme /opt/gcc/bin/gcc -I/opt/openmpi/include/openmpi -I/opt/openmpi/include -pthread -mfp-trap-mode=su myprog.c -L/opt/openmpi/lib -lmpi -lopen-rte -lopen-pal -ldl dummy ranlib (Note: the "-mfp-trap-mode=su" prevents a runtime SIGSEGV crash with GNU compiler on Alpha system) $ mpicc myprog.c --showme:link -pthread -mfp-trap-mode=su myprog.c -L/opt/openmpi/lib -lmpi -lopen-rte -lopen-pal -ldl dummy ranlib What is the "dummy" and "ranlib" doing here? This specific problem may be due to a bug that Brian just found/fixed in the configure script last night (due to a bug report from Paul Van Allsburg). Could you try any nightly trunk tarball after r14600 (the fix hasn't made its way over to the 1.2 release branch yet; I assume it will soon)? I'm now trying the nightly build from SVN (version 1.3a1r14551), but I'm afraid that Alpha support is still not there.if that's the case, is there any chance to fix openmpi for Alpha? So I think you're having 2 issues (right?): 1. The opal missing symbol when you compile dynamically 2. The dummy/ranlib arguments in mpicc and friends #2 may be fixed; #1 will require a closer look (per my previous mail). My OS is CentOS 4.4 (the equivalent of RedHat Enterprise Edition 4). Hence, my packages are not so up-to-date versions: autoconf-2.59-5 automake15-1.5-13 automake-1.9.2-3 automake14-1.4p6-12 automake17-1.7.9-5 automake16-1.6.3-5 libtool-1.5.6-4.EL4.1.c4.2 libtool-libs-1.5.6-4.EL4.1.c4.2 flex-2.5.4a-33 (what else is essential to build OpenMpi?) Building from SVN will require more recent versions of these tools (libtool in particular) -- see: http://www.open-mpi.org/svn/ building.php. The HACKING file has good instructions on how to get recent versions of the tools without hosing your system: http:// svn.open-mpi.org/svn/ompi/trunk/HACKING. -- Jeff Squyres Cisco Systems
Re: [OMPI users] AlphaServers & OpenMPI
Sorry for the delay in replying -- per the other thread, let's see if the mpicc problem was fixed last night, and let's see the configure output files to try to get an idea about what the problem was in regards to the opal missing symbol. To be honest, however, none of the current Open MPI members support the Alpha platform. Proper development and maintenance may therefore be somewhat difficult (indeed, I have no customers who use Alpha, so it's hard for me to justify spending time on Alpha-specific issues). That being said, Open MPI is an open source project and we welcome the contributions of others! :-) On May 8, 2007, at 6:05 AM, Rob wrote: Hi, What is the problem with supporting AlphaServers in OpenMPI? The alternatives, MPICH1 (very old) supports AlphaServers; and MPICH2 (new) appears to work on AlphaServers too (but setting up MPICH2 with the mpd ring is just too complicated). Hence, I would prefer OpenMPI instead. Is there a way to get OpenMPI work on my AlphaSystems? Thanks, Rob. __ __ 8:00? 8:25? 8:40? Find a flick in no time with the Yahoo! Search movie showtime shortcut. http://tools.search.yahoo.com/shortcuts/#news ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] openMPI over uDAPL doesn't work
I'm forwarding this to the OpenFabrics general list -- as it just came up the other day, we know that Open MPI's UDAPL support works on Solaris, but we have done little/no testing of it on OFED (I personally know almost nothing about UDPAL). Can the UDAPL OFED wizards shed any light on the error messages that are listed below? In particular, these seem to be worrysome: setup_listener Permission denied setup_listener Address already in use and create_qp Address already in use Thanks... On May 8, 2007, at 5:37 AM, Boris Bierbaum wrote: Hi, we (my collegue Andreas and me) are still trying to solve this problem. I have compiled some additional information, maybe somebody has an idea about what's going on. OS: Debian GNU/Linux 4.0, Kernel 2.6.18, x86, 32-Bit IB software: OFED 1.1 SM: OpenSM from OFED 1.1 uDAPL: DAPL reference implementation version gamma 3.02 (using DAPL from OFED 1.1 doesn't change anything, I suppose it's the same code, at least roughly) Test program: Intel MPI Benchmarks Version 2.3 OpenMPI version: 1.2.1 Running OpenMPI directly over IB verbs (mpirun --mca btl self,sm,openib ...) works. Here's the output of ibv_devinfo and ifconfig for the two nodes on which tried to run the benchmark (ulimit -l is unlimited on both machines): 1st node --- boris@pd-04:/work/boris/IMB_2.3/src$ /opt/infiniband/bin/ibv_devinfo hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0020:b528 sys_image_guid: 0002:c902:0020:b52b vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_023001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 9 port_lmc: 0x00 boris@pd-04:/work/boris/IMB_2.3/src$ /sbin/ifconfig ... ib0 Protokoll:UNSPEC Hardware Adresse 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet Adresse:192.168.0.14 Bcast:192.168.0.255 Maske:255.255.255.0 inet6 Adresse: fe80::202:c902:20:b529/64 Gültigkeitsbereich:Verbindung UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:67 errors:0 dropped:0 overruns:0 frame:0 TX packets:16 errors:0 dropped:2 overruns:0 carrier:0 Kollisionen:0 Sendewarteschlangenlänge:128 RX bytes:3752 (3.6 KiB) TX bytes:968 (968.0 b) ... 2nd node --- boris@pd-05:~$ /opt/infiniband/bin/ibv_devinfo hca_id: mthca0 fw_ver: 1.2.0 node_guid: 0002:c902:0020:b4f4 sys_image_guid: 0002:c902:0020:b4f7 vendor_id: 0x02c9 vendor_part_id: 25204 hw_ver: 0xA0 board_id: MT_023001 phys_port_cnt: 1 port: 1 state: PORT_ACTIVE (4) max_mtu:2048 (4) active_mtu: 2048 (4) sm_lid: 1 port_lid: 10 port_lmc: 0x00 boris@pd-05:~$ /sbin/ifconfig ... ib0 Protokoll:UNSPEC Hardware Adresse 00-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00 inet Adresse:192.168.0.15 Bcast:192.168.0.255 Maske:255.255.255.0 inet6 Adresse: fe80::202:c902:20:b4f5/64 Gültigkeitsbereich:Verbindung UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:67 errors:0 dropped:0 overruns:0 frame:0 TX packets:18 errors:0 dropped:2 overruns:0 carrier:0 Kollisionen:0 Sendewarteschlangenlänge:128 RX bytes:3752 (3.6 KiB) TX bytes:1088 (1.0 KiB) ... -- --- Here's the output from the failed run, with every DAT and DAPL debug output enabled: boris@pd-04:/work/boris/IMB_2.3/src$ mpirun -np 2 -x DAT_DBG_TYPE -x DAPL_DBG_TYPE -x DAT_OVERRIDE --mca btl self,sm,udapl --host pd-04,pd-05 /work/boris/IMB_2.3/src/IMB-MPI1 pingpong DAT Registry: Started (dat_init) DAT Registry: static registry file DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value DAT Registry: token type string value libdapl_openib_cma.so> DAT Registry: token type string value DAT
[OMPI users] Fwd: [ofa-general] Re: openMPI over uDAPL doesn't work
Re-forwarding to OMPI list; because of the OMPI list anti-spam checks, Arlin's post didn't make it through to our list when he originally posted. Begin forwarded message: From: Arlin Davis Date: May 8, 2007 3:09:02 PM EDT To: Jeff Squyres Cc: Open MPI Users , OpenFabrics General Subject: Re: [ofa-general] Re: [OMPI users] openMPI over uDAPL doesn't work Jeff Squyres wrote: I'm forwarding this to the OpenFabrics general list -- as it just came up the other day, we know that Open MPI's UDAPL support works on Solaris, but we have done little/no testing of it on OFED (I personally know almost nothing about UDPAL). Can the UDAPL OFED wizards shed any light on the error messages that are listed below? In particular, these seem to be worrysome: setup_listener Permission denied setup_listener Address already in use These failures are from rdma_cm_bind indicating the port is already bound to this IA address. How are you creating the service point? dat_psp_create or dat_psp_create_any? If it is psp_create_any then you will see some failures until it gets to a free port. That is normal. Just make sure your create call returns DAT_SUCCESS. create_qp Address already in use This is a real problem with the bind, port is already in use. Not sure why this would fail since the current version of OFED uDAPL uses a wildcard port when binding and uses the address from the open; I remember an issue a while back with rdma_cm and wildcard ports. What version of OFED are you using? -arlin -- Jeff Squyres Cisco Systems
[OMPI users] Newbie question. Please help.
Hi, all. I am new to OpenMPI and after initial setup I tried to run my app but got the followign errors: [node07.my.com:16673] *** An error occurred in MPI_Comm_rank [node07.my.com:16673] *** on communicator MPI_COMM_WORLD [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16674] *** An error occurred in MPI_Comm_rank [node07.my.com:16674] *** on communicator MPI_COMM_WORLD [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16675] *** An error occurred in MPI_Comm_rank [node07.my.com:16675] *** on communicator MPI_COMM_WORLD [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16676] *** An error occurred in MPI_Comm_rank [node07.my.com:16676] *** on communicator MPI_COMM_WORLD [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) mpiexec noticed that job rank 2 with PID 16675 on node node07 exited on signal 60 (Real-time signal 26). /usr/local/openmpi-1.2.1/bin/ompi_info Open MPI: 1.2.1 Open MPI SVN revision: r14481 Open RTE: 1.2.1 Open RTE SVN revision: r14481 OPAL: 1.2.1 OPAL SVN revision: r14481 Prefix: /usr/local/openmpi-1.2.1 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon May 7 18:32:56 PDT 2007 Configure host: neptune.nanostellar.com Built by: root Built on: Mon May 7 18:40:28 PDT 2007 Built host: neptune.my.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yes MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1) MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1) MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1) MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA iof: svc (MCA
Re: [OMPI users] Newbie question. Please help.
Steven, We run vasp on both Linux (PGI compilers) and Max OSX (xlf) I am sad to announce that VASP does not work with openMPI last I tried (1.1.1) With the errors you reported are the same I saw. VASP for the time (version 4) Works only with Lam and MPICH-1. If you have insight into the vasp devs that both those projects are un-maintined it would be a huge help for us all! Again though if any Fortran guru has had time to find out how to make VASP work with OMPI Please contact us all right away! Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On May 8, 2007, at 10:18 PM, Steven Truong wrote: Hi, all. I am new to OpenMPI and after initial setup I tried to run my app but got the followign errors: [node07.my.com:16673] *** An error occurred in MPI_Comm_rank [node07.my.com:16673] *** on communicator MPI_COMM_WORLD [node07.my.com:16673] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16673] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16674] *** An error occurred in MPI_Comm_rank [node07.my.com:16674] *** on communicator MPI_COMM_WORLD [node07.my.com:16674] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16674] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16675] *** An error occurred in MPI_Comm_rank [node07.my.com:16675] *** on communicator MPI_COMM_WORLD [node07.my.com:16675] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16675] *** MPI_ERRORS_ARE_FATAL (goodbye) [node07.my.com:16676] *** An error occurred in MPI_Comm_rank [node07.my.com:16676] *** on communicator MPI_COMM_WORLD [node07.my.com:16676] *** MPI_ERR_COMM: invalid communicator [node07.my.com:16676] *** MPI_ERRORS_ARE_FATAL (goodbye) mpiexec noticed that job rank 2 with PID 16675 on node node07 exited on signal 60 (Real-time signal 26). /usr/local/openmpi-1.2.1/bin/ompi_info Open MPI: 1.2.1 Open MPI SVN revision: r14481 Open RTE: 1.2.1 Open RTE SVN revision: r14481 OPAL: 1.2.1 OPAL SVN revision: r14481 Prefix: /usr/local/openmpi-1.2.1 Configured architecture: x86_64-unknown-linux-gnu Configured by: root Configured on: Mon May 7 18:32:56 PDT 2007 Configure host: neptune.nanostellar.com Built by: root Built on: Mon May 7 18:40:28 PDT 2007 Built host: neptune.my.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran77 compiler abs: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler: /opt/intel/fce/9.1.043/bin/ifort Fortran90 compiler abs: /opt/intel/fce/9.1.043/bin/ifort C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yes MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.1) MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.1) MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.1) MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.1) MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.1) MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.1) MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.1) MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1) MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)