Wow - okay, I'll have to investigate. Be aware, though, that you just described a completely different failure. Oracle isn't using slurm, last I heard - you were using rsh/qrsh. And you aren't running from a backend node, but from the same frontend - just have two hosts listed in your -host entry.
I'll look at both issues. Thx! On Sep 10, 2012, at 9:12 AM, Eugene Loh <eugene....@oracle.com> wrote: > On 09/10/12 11:37, Ralph Castain wrote: >> On Sep 10, 2012, at 8:12 AM, Aleksey Senin<aleks...@dev.mellanox.co.il> >> wrote: >> >>> On 10/09/2012 15:41, Siegmar Gross wrote: >>>> Hi, >>>> >>>> I have built openmpi-1.6.2rc1 and get the following error. >>>> >>>> tyr small_prog 123 mpicc -showme >>>> cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt >>>> -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp >>>> -lsocket -lnsl -lrt -lm >>>> tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize >>>> >>>> Hello! >>>> Hello! >>>> >>>> tyr small_prog 125 mpiexec -np 2 -host sunpc4 init_finalize >>>> key_from_blob: remaining bytes in key blob 81 >>>> >>>> Hello! >>>> Hello! >>>> >>>> tyr small_prog 126 mpiexec -np 2 -host tyr,sunpc4 init_finalize >>>> [tyr:23956] *** Process received signal *** >>>> [tyr:23956] Signal: Segmentation Fault (11) >>>> [tyr:23956] Signal code: Address not mapped (1) >>>> [tyr:23956] Failing at address: 18 >>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:0x15434c >>>> /lib/libc.so.1:0xcad04 >>>> /lib/libc.so.1:0xbf3b4 >>>> /lib/libc.so.1:0xbf59c >>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_get_target_nodes+0x1cc >>>> [ Signal 11 (SEGV)] >>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_rmaps_round_robin.so:0x1ec8 >>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_map_job+0xe4 >>>> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_plm_base_setup_job+0xc4 >>>> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_plm_rsh.so:orte_plm_rsh_launch+0x1b0 >>>> /.../openmpi-1.6.2_32_cc/bin/orterun:orterun+0x16a8 >>>> /.../openmpi-1.6.2_32_cc/bin/orterun:main+0x24 >>>> /.../openmpi-1.6.2_32_cc/bin/orterun:_start+0xd8 >>>> [tyr:23956] *** End of error message *** >>>> Segmentation fault >>>> >>>> Do you have any ideas or suggestions? As I wrote in my email from >>>> yesterday, I had to add "#include<math.h>" into file >>>> openmpi-1.6.2rc1/ompi/contrib/vt/vt/extlib/otf/tools/otfaux/otfaux.cpp >>>> to have a prototype for function "rint" in line 834. Thank you very >>>> much for any help in advance. >> Really? That shouldn't happen - I'll take a look at that one. > Yes, Oracle MTT testing shows 1.6.2rc2r27272 DOA: > > % mpirun --host burl-ct-x2200-2 -np 2 hostname > burl-ct-x2200-2 > burl-ct-x2200-2 > % mpirun --host burl-ct-x2200-3 -np 2 hostname > burl-ct-x2200-3 > burl-ct-x2200-3 > % mpirun --host burl-ct-x2200-2,burl-ct-x2200-3 -np 2 hostname > [burl-ct-x2200-2:26019] *** Process received signal *** > [burl-ct-x2200-2:26019] Signal: Segmentation fault (11) > [burl-ct-x2200-2:26019] Signal code: Address not mapped (1) > [burl-ct-x2200-2:26019] Failing at address: 0x18 > [burl-ct-x2200-2:26019] [ 0] [0xffffe600] > [burl-ct-x2200-2:26019] [ 1] > /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_get_target_nodes+0x432) > [0xf7e6d482] > [burl-ct-x2200-2:26019] [ 2] > /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_rmaps_round_robin.so > [0xf7dcd8e5] > [burl-ct-x2200-2:26019] [ 3] > /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_rmaps_base_map_job+0x46) > [0xf7e6c4d6] > [burl-ct-x2200-2:26019] [ 4] > /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/libopen-rte.so.4(orte_plm_base_setup_job+0x9c) > [0xf7e647ec] > [burl-ct-x2200-2:26019] [ 5] > /workspace/euloh/hpc/mtt-scratch/burl-ct-x2200-2/ompi-tarball-testing/installs/kBc6/install/lib/openmpi/mca_plm_rsh.so(orte_plm_rsh_launch+0x244) > [0xf7dfb634] > [burl-ct-x2200-2:26019] [ 6] mpirun(orterun+0xf5e) [0x804b868] > [burl-ct-x2200-2:26019] [ 7] mpirun(main+0x22) [0x804a8f6] > [burl-ct-x2200-2:26019] [ 8] /lib/libc.so.6(__libc_start_main+0xdc) [0xb10dec] > [burl-ct-x2200-2:26019] [ 9] mpirun [0x804a851] > [burl-ct-x2200-2:26019] *** End of error message *** > Segmentation fault > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users