Hi Gilles,
the minimal configuration to reproduce an error with spawn_master
are two Sparc machines.
tyr spawn 124 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
OPAL repo revision: v1.10.2-176-g9d45e07
C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
tyr spawn 125 ssh ruester ompi_info | grep -e "OPAL repo revision" -e "C
compiler absolute"
OPAL repo revision: v1.10.2-176-g9d45e07
C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
tyr spawn 126 uname -a
SunOS tyr.informatik.hs-fulda.de 5.10 Generic_150400-11 sun4u sparc SUNW,A70
Solaris
tyr spawn 127 ssh ruester uname -a
SunOS ruester.informatik.hs-fulda.de 5.10 Generic_150400-04 sun4u sparc
SUNW,SPARC-Enterprise Solaris
tyr spawn 128 mpiexec -np 1 --host tyr,tyr,tyr,tyr,tyr spawn_master
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 4 slave processes
Parent process 0: tasks in MPI_COMM_WORLD: 1
tasks in COMM_CHILD_PROCESSES local group: 1
tasks in COMM_CHILD_PROCESSES remote group: 4
Slave process 1 of 4 running on tyr.informatik.hs-fulda.de
Slave process 0 of 4 running on tyr.informatik.hs-fulda.de
Slave process 3 of 4 running on tyr.informatik.hs-fulda.de
Slave process 2 of 4 running on tyr.informatik.hs-fulda.de
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 0: argv[0]: spawn_slave
spawn_slave 3: argv[0]: spawn_slave
spawn_slave 1: argv[0]: spawn_slave
tyr spawn 129 mpiexec -np 1 --host tyr,tyr,tyr,tyr,ruester spawn_master
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 4 slave processes
Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *)
(proc_pointer))->obj_magic_id, file
../../openmpi-v1.10.2-176-g9d45e07/ompi/group/group_init.c, line 215, function
ompi_group_increment_proc_count
[ruester:23592] *** Process received signal ***
[ruester:23592] Signal: Abort (6)
[ruester:23592] Signal code: (-1)
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x2c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libopen-pal.so.13.0.2:0xc2c0c
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0xf0
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x6638
/usr/local/openmpi-1.10.3_64_gcc/lib64/openmpi/mca_dpm_orte.so:0x948c
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x1978
/usr/local/openmpi-1.10.3_64_gcc/lib64/libmpi.so.12.0.3:MPI_Init+0x2a8
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x10
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x7c
[ruester:23592] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 3 with PID 0 on node ruester exited on signal
6 (Abort).
--------------------------------------------------------------------------
tyr spawn 130
A minimal configuration to reproduce an error with spawn_intra_comm
is a single machine for openmpi-2.x and openmpi-master. I didn't get
an error message on Linux (it just hangs after displaying the messages).
tyr spawn 114 ompi_info | grep -e "OPAL repo revision" -e "C compiler absolute"
OPAL repo revision: dev-4010-g6c9d65c
C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
tyr spawn 115 mpiexec -np 1 --host tyr,tyr,tyr spawn_intra_comm
Parent process 0: I create 2 slave processes
Child process 0 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 1
Child process 1 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
Parent process 0 running on tyr.informatik.hs-fulda.de
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
[[48188,1],0][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from tyr to: tyr Unable to connect to the peer 193.174.24.39 on port 1026:
Connection refused
[tyr.informatik.hs-fulda.de:06684]
../../../../../openmpi-dev-4010-g6c9d65c/ompi/mca/pml/ob1/pml_ob1_sendreq.c:237
FATAL
tyr spawn 116
sunpc1 fd1026 102 ompi_info | grep -e "OPAL repo revision" -e "C compiler
absolute"
OPAL repo revision: dev-4010-g6c9d65c
C compiler absolute: /usr/local/gcc-5.1.0/bin/gcc
sunpc1 fd1026 103 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1 spawn_intra_comm
Parent process 0: I create 2 slave processes
Parent process 0 running on sunpc1
MPI_COMM_WORLD ntasks: 1
COMM_CHILD_PROCESSES ntasks_local: 1
COMM_CHILD_PROCESSES ntasks_remote: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 0
Child process 0 running on sunpc1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 1
Child process 1 running on sunpc1
MPI_COMM_WORLD ntasks: 2
COMM_ALL_PROCESSES ntasks: 3
mytid in COMM_ALL_PROCESSES: 2
[[15953,2],0][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from sunpc1 to: sunpc1 Unable to connect to the peer 193.174.26.210 on port
1024: Connection refused
[[15953,2],1][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from sunpc1 to: sunpc1 Unable to connect to the peer 193.174.26.210 on port
1024: Connection refused
[sunpc1:15813]
../../../../../openmpi-dev-4010-g6c9d65c/ompi/mca/pml/ob1/pml_ob1_sendreq.c:237
FATAL
[[15953,2],0][../../../../../openmpi-dev-4010-g6c9d65c/opal/mca/btl/tcp/btl_tcp_endpoint.c:755:mca_btl_tcp_endpoint_start_connect]
from sunpc1 to: sunpc1 Unable to connect to the peer 193.174.26.210 on port
1024: Connection refused
[sunpc1:15811]
../../../../../openmpi-dev-4010-g6c9d65c/ompi/mca/pml/ob1/pml_ob1_sendreq.c:237
FATAL
sunpc1 fd1026 104
Kind regards
Siegmar
On 05/06/16 10:36, Gilles Gouaillardet wrote:
Siegmar,
i was unable to reproduce the issue with one solaris 11 x86_64 VM and one linux
x86_64 VM
what is the minimal configuration you need to reproduce the issue ?
are you able to reproduce the issue with only x86_64 nodes ?
i was under the impression that solaris vs linux is the issue, but is big vs
little endian instead ?
Cheers,
Gilles
On 5/5/2016 9:13 PM, Siegmar Gross wrote:
Hi Gilles,
is the following output helpful to find the error? I've put
another output below the output from gdb, which shows that
things are a little bit "random" if I use only 3+2 or 4+1
Sparc machines.
tyr spawn 127 /usr/local/gdb-7.6.1_64_gcc/bin/gdb mpiexec
GNU gdb (GDB) 7.6.1
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from
/export2/prog/SunOS_sparc/openmpi-1.10.3_64_cc/bin/orterun...done.
(gdb) set args -np 1 --host tyr,sunpc1,linpc1,ruester spawn_multiple_master
(gdb) run
Starting program: /usr/local/openmpi-1.10.3_64_cc/bin/mpiexec -np 1 --host
tyr,sunpc1,linpc1,ruester spawn_multiple_master
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
[New LWP 2 ]
Parent process 0 running on tyr.informatik.hs-fulda.de
I create 3 slave processes.
Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *)
(proc_pointer))->obj_magic_id, file
../../openmpi-v1.10.2-163-g42da15d/ompi/group/group_init.c, line 215, function
ompi_group_increment_proc_count
[ruester:17809] *** Process received signal ***
[ruester:17809] Signal: Abort (6)
[ruester:17809] Signal code: (-1)
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0
/lib/sparcv9/libc.so.1:0xd8c28
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0x113d4
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_mpi_init+0x188c
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Init+0x26c
/home/fd1026/SunOS/sparc/bin/spawn_slave:main+0x18
/home/fd1026/SunOS/sparc/bin/spawn_slave:_start+0x108
[ruester:17809] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 2 with PID 0 on node ruester exited on
signal 6 (Abort).
--------------------------------------------------------------------------
[LWP 2 exited]
[New Thread 2 ]
[Switching to Thread 1 (LWP 1)]
sol_thread_fetch_registers: td_ta_map_id2thr: no thread can be found to
satisfy query
(gdb) bt
#0 0xffffffff7f6173d0 in rtld_db_dlactivity () from /usr/lib/sparcv9/ld.so.1
#1 0xffffffff7f6175a8 in rd_event () from /usr/lib/sparcv9/ld.so.1
#2 0xffffffff7f618950 in lm_delete () from /usr/lib/sparcv9/ld.so.1
#3 0xffffffff7f6226bc in remove_so () from /usr/lib/sparcv9/ld.so.1
#4 0xffffffff7f624574 in remove_hdl () from /usr/lib/sparcv9/ld.so.1
#5 0xffffffff7f61d97c in dlclose_core () from /usr/lib/sparcv9/ld.so.1
#6 0xffffffff7f61d9d4 in dlclose_intn () from /usr/lib/sparcv9/ld.so.1
#7 0xffffffff7f61db0c in dlclose () from /usr/lib/sparcv9/ld.so.1
#8 0xffffffff7e5f9718 in dlopen_close (handle=0x100)
at
../../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/dl/dlopen/dl_dlopen_module.c:144
#9 0xffffffff7e5f364c in opal_dl_close (handle=0xffffff7d700200ff)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/dl/base/dl_base_fns.c:53
#10 0xffffffff7e546714 in ri_destructor (obj=0x1200)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_component_repository.c:357
#11 0xffffffff7e543840 in opal_obj_run_destructors (object=0xffffff7f607a6cff)
at ../../../../openmpi-v1.10.2-163-g42da15d/opal/class/opal_object.h:451
#12 0xffffffff7e545f54 in mca_base_component_repository_release
(component=0xffffff7c801df0ff)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_component_repository.c:223
#13 0xffffffff7e54d0d8 in mca_base_component_unload
(component=0xffffff7d00003000, output_id=-1610596097)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:47
#14 0xffffffff7e54d17c in mca_base_component_close (component=0x100,
output_id=-1878702080)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:60
#15 0xffffffff7e54d28c in mca_base_components_close (output_id=1942099968,
components=0xff,
skip=0xffffff7f61c5a800)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:86
#16 0xffffffff7e54d1cc in mca_base_framework_components_close
(framework=0x1000000ff, skip=0x10018ebb000)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_components_close.c:68
#17 0xffffffff7ee4db88 in orte_oob_base_close ()
at
../../../../openmpi-v1.10.2-163-g42da15d/orte/mca/oob/base/oob_base_frame.c:94
#18 0xffffffff7e580054 in mca_base_framework_close
(framework=0xffffff0000004fff)
at
../../../../openmpi-v1.10.2-163-g42da15d/opal/mca/base/mca_base_framework.c:198
#19 0xffffffff7c514cdc in rte_finalize ()
at
../../../../../openmpi-v1.10.2-163-g42da15d/orte/mca/ess/hnp/ess_hnp_module.c:882
#20 0xffffffff7ec5c414 in orte_finalize () at
../../openmpi-v1.10.2-163-g42da15d/orte/runtime/orte_finalize.c:65
#21 0x000000010000eb24 in orterun (argc=1423033599, argv=0xffffff7fffce41ff)
at
../../../../openmpi-v1.10.2-163-g42da15d/orte/tools/orterun/orterun.c:1151
#22 0x0000000100004d4c in main (argc=416477439, argv=0xffffff7fffd7f000)
at ../../../../openmpi-v1.10.2-163-g42da15d/orte/tools/orterun/main.c:13
(gdb)
tyr spawn 145 mpiexec -np 1 --host ruester,ruester,ruester,tyr,tyr
spawn_multiple_master
Parent process 0 running on ruester.informatik.hs-fulda.de
I create 3 slave processes.
Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *)
(proc_pointer))->obj_magic_id, file
../../openmpi-v1.10.2-163-g42da15d/ompi/group/group_init.c, line 215, function
ompi_group_increment_proc_count
[ruester:18238] *** Process received signal ***
[ruester:18238] Signal: Abort (6)
[ruester:18238] Signal code: (-1)
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:opal_backtrace_print+0x1c
/usr/local/openmpi-1.10.3_64_cc/lib64/libopen-pal.so.13.0.2:0x1b10f0
/lib/sparcv9/libc.so.1:0xd8c28
------------------------------------------------------------
A process or daemon was unable to complete a TCP connection
to another process:
Local host: ruester
Remote host: ruester
This is usually caused by a firewall on the remote host. Please
check that any firewall (e.g., iptables) has been disabled and
try again.
------------------------------------------------------------
/lib/sparcv9/libc.so.1:0xcc79c
/lib/sparcv9/libc.so.1:0xcc9a8
/lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 2091943080 (?)]
/lib/sparcv9/libc.so.1:abort+0xd0
/lib/sparcv9/libc.so.1:_assert_c99+0x78
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:ompi_group_increment_proc_count+0x10c
/usr/local/openmpi-1.10.3_64_cc/lib64/openmpi/mca_dpm_orte.so:0xe758
/usr/local/openmpi-1.10.3_64_cc/lib64/libmpi.so.12.0.3:MPI_Comm_spawn_multiple+0x8f4
/home/fd1026/SunOS/sparc/bin/spawn_multiple_master:main+0x188
/home/fd1026/SunOS/sparc/bin/spawn_multiple_master:_start+0x108
[ruester:18238] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 0 on node ruester exited on
signal 6 (Abort).
--------------------------------------------------------------------------
tyr spawn 146 mpiexec -np 1 --host ruester,ruester,ruester,ruester,tyr
spawn_multiple_master
Parent process 0 running on ruester.informatik.hs-fulda.de
I create 3 slave processes.
Parent process 0: tasks in MPI_COMM_WORLD: 1
tasks in COMM_CHILD_PROCESSES local group: 1
tasks in COMM_CHILD_PROCESSES remote group: 3
Slave process 2 of 3 running on ruester.informatik.hs-fulda.de
Slave process 0 of 3 running on ruester.informatik.hs-fulda.de
spawn_slave 0: argv[0]: spawn_slave
Slave process 1 of 3 running on ruester.informatik.hs-fulda.de
spawn_slave 1: argv[0]: spawn_slave
spawn_slave 1: argv[1]: program type 2
spawn_slave 1: argv[2]: another parameter
spawn_slave 2: argv[0]: spawn_slave
spawn_slave 2: argv[1]: program type 2
spawn_slave 2: argv[2]: another parameter
spawn_slave 0: argv[1]: program type 1
tyr spawn 147
Hopefully you can sort these things out. I've no idea what happens
and why I get different outputs, if I use different sets of the
same machines.
Kind regards
Siegmar
Am 05.05.2016 um 11:13 schrieb Gilles Gouaillardet:
Siegmar,
is this Solaris 10 specific (e.g. Solaris 11 works fine)
( I only have a x86_64 vm with Solaris 11 and sun compilers ...)
Cheers,
Gilles
On Thursday, May 5, 2016, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de
<mailto:siegmar.gr...@informatik.hs-fulda.de>> wrote:
Hi Ralph and Gilles,
Am 04.05.2016 um 20:02 schrieb rhc54:
@ggouaillardet <https://github.com/ggouaillardet> Where does this stand?
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
<https://github.com/open-mpi/ompi/issues/1569#issuecomment-216950103>
With my last installed version of openmpi-v1.10.x all of my
spawn programs fail on Solaris Sparc and x86_64 with the same
error for both compilers (gcc-5.1.0 and Sun C 5.13). Everything
works as expected on Linux. Tomorrow I'm back in my office and
I can try to build and test the latest version.
sunpc1 fd1026 108 ompi_info | grep -e "OPAL repo" -e "C compiler absolute"
OPAL repo revision: v1.10.2-163-g42da15d
C compiler absolute: /opt/solstudio12.4/bin/cc
sunpc1 fd1026 114 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
spawnmaster
[sunpc1:00957] *** Process received signal ***
[sunpc1:00957] Signal: Segmentation Fault (11)
[sunpc1:00957] Signal code: Address not mapped (1)
[sunpc1:00957] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_master:main+0x21
[sunpc1:00957] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 957 on node sunpc1 exited on
signa 11 (Segmentation Fault).
--------------------------------------------------------------------------
sunpc1 fd1026 115 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
spawnmultiple_master
[sunpc1:00960] *** Process received signal ***
[sunpc1:00960] Signal: Segmentation Fault (11)
[sunpc1:00960] Signal code: Address not mapped (1)
[sunpc1:00960] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_multiple_master:main+0x5d
[sunpc1:00960] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 960 on node sunpc1 exited on
signa 11 (Segmentation Fault).
--------------------------------------------------------------------------
sunpc1 fd1026 116 mpiexec -np 1 --host sunpc1,sunpc1,sunpc1,sunpc1,sunpc1
spawnintra_comm
[sunpc1:00963] *** Process received signal ***
[sunpc1:00963] Signal: Segmentation Fault (11)
[sunpc1:00963] Signal code: Address not mapped (1)
[sunpc1:00963] Failing at address: 0
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalbacktrace_print+0x2d
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:0x2383c
/lib/amd64/libc.so.1:0xdd6b6
/lib/amd64/libc.so.1:0xd1f82
/lib/amd64/libc.so.1:strlen+0x30 [ Signal 11 (SEGV)]
/lib/amd64/libc.so.1:vsnprintf+0x51
/lib/amd64/libc.so.1:vasprintf+0x49
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-pal.so.20.0.0:opalshow_help_vstring+0x83
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libopen-rte.so.20.0.0:orteshow_help+0xd6
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:ompi_mpi_nit+0x1010
/export2/prog/SunOS_x86_64/openmpi-2.0.0_64_cc/lib64/libmpi.so.20.0.0:PMPI_Init0x9d
/home/fd1026/SunOS/x86_64/bin/spawn_intra_comm:main+0x23
[sunpc1:00963] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 963 on node sunpc1 exited on
signa 11 (Segmentation Fault).
--------------------------------------------------------------------------
sunpc1 fd1026 117
Kind regards
Siegmar
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/05/29090.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/05/29092.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/05/29096.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/05/29110.php