On 09/10/12 11:37, Ralph Castain wrote:
On Sep 10, 2012, at 8:12 AM, Aleksey Senin<aleks...@dev.mellanox.co.il> wrote:
On 10/09/2012 15:41, Siegmar Gross wrote:
Hi,
I have built openmpi-1.6.2rc1 and get the following error.
tyr small_prog 123 mpicc -showme
cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt
-L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp
-lsocket -lnsl -lrt -lm
tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize
Hello!
Hello!
tyr small_prog 125 mpiexec -np 2 -host sunpc4 init_finalize
key_from_blob: remaining bytes in key blob 81
Hello!
Hello!
tyr small_prog 126 mpiexec -np 2 -host tyr,sunpc4 init_finalize
[tyr:23956] *** Process received signal ***
[tyr:23956] Signal: Segmentation Fault (11)
[tyr:23956] Signal code: Address not mapped (1)
[tyr:23956] Failing at address: 18
/.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:0x15434c
/lib/libc.so.1:0xcad04
/lib/libc.so.1:0xbf3b4
/lib/libc.so.1:0xbf59c
/.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_get_target_nodes+0x1cc
[ Signal 11 (SEGV)]
/.../openmpi-1.6.2_32_cc/lib/openmpi/mca_rmaps_round_robin.so:0x1ec8
/.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_map_job+0xe4
/.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_plm_base_setup_job+0xc4
/.../openmpi-1.6.2_32_cc/lib/openmpi/mca_plm_rsh.so:orte_plm_rsh_launch+0x1b0
/.../openmpi-1.6.2_32_cc/bin/orterun:orterun+0x16a8
/.../openmpi-1.6.2_32_cc/bin/orterun:main+0x24
/.../openmpi-1.6.2_32_cc/bin/orterun:_start+0xd8
[tyr:23956] *** End of error message ***
Segmentation fault
Do you have any ideas or suggestions? As I wrote in my email from
yesterday, I had to add "#include<math.h>" into file
openmpi-1.6.2rc1/ompi/contrib/vt/vt/extlib/otf/tools/otfaux/otfaux.cpp
to have a prototype for function "rint" in line 834. Thank you very
much for any help in advance.
Really? That shouldn't happen - I'll take a look at that one.
Yes, Oracle MTT testing shows 1.6.2rc2r27272 DOA:
% mpirun --host burl-ct-x2200-2 -np 2 hostname
burl-ct-x2200-2
burl-ct-x2200-2
% mpirun --host burl-ct-x2200-3 -np 2 hostname
burl-ct-x2200-3
burl-ct-x2200-3
% mpirun --host burl-ct-x2200-2,burl-ct-x2200-3 -np 2 hostname
[burl-ct-x2200-2:26019] *** Process received signal ***
[burl-ct-x2200-2:26019] Signal: Segmentation fault (11)
[burl-ct-x2200-2:26019] Signal code: Address not mapped (1)
[burl-ct-x2200-2:26019] Failing at address: 0x18
[burl-ct-x2200-2:26019] [ 0] [0xffffe600]
[burl-ct-x2200-2:26019] [ 1]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_get_target_nodes+0x432)
[0xf7e6d482]
[burl-ct-x2200-2:26019] [ 2]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_rmaps_round_robin.so
[0xf7dcd8e5]
[burl-ct-x2200-2:26019] [ 3]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_map_job+0x46)
[0xf7e6c4d6]
[burl-ct-x2200-2:26019] [ 4]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_plm_base_setup_job+0x9c)
[0xf7e647ec]
[burl-ct-x2200-2:26019] [ 5]
/workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_launch+0x244)
[0xf7dfb634]
[burl-ct-x2200-2:26019] [ 6] mpirun(orterun+0xf5e) [0x804b868]
[burl-ct-x2200-2:26019] [ 7] mpirun(main+0x22) [0x804a8f6]
[burl-ct-x2200-2:26019] [ 8] /lib/libc.so.6(__libc_start_main+0xdc)
[0xb10dec]
[burl-ct-x2200-2:26019] [ 9] mpirun [0x804a851]
[burl-ct-x2200-2:26019] *** End of error message ***
Segmentation fault