Just following up on this comment about running from a backend node while under slurm - I just tested this (using the patched 1.6 branch) and found it works just fine. However, note that you will only be able to execute on that local node as we cannot detect the full allocation anywhere but on the node where the allocation was granted.
On Sep 10, 2012, at 8:12 AM, Aleksey Senin <aleks...@dev.mellanox.co.il> wrote: > On 10/09/2012 15:41, Siegmar Gross wrote: >> Hi, >> >> I have built openmpi-1.6.2rc1 and get the following error. >> >> tyr small_prog 123 mpicc -showme >> cc -I/usr/local/openmpi-1.6.2_32_cc/include -mt >> -L/usr/local/openmpi-1.6.2_32_cc/lib -lmpi -lm -lkstat -llgrp >> -lsocket -lnsl -lrt -lm >> tyr small_prog 124 mpiexec -np 2 -host tyr init_finalize >> >> Hello! >> Hello! >> >> tyr small_prog 125 mpiexec -np 2 -host sunpc4 init_finalize >> key_from_blob: remaining bytes in key blob 81 >> >> Hello! >> Hello! >> >> tyr small_prog 126 mpiexec -np 2 -host tyr,sunpc4 init_finalize >> [tyr:23956] *** Process received signal *** >> [tyr:23956] Signal: Segmentation Fault (11) >> [tyr:23956] Signal code: Address not mapped (1) >> [tyr:23956] Failing at address: 18 >> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:0x15434c >> /lib/libc.so.1:0xcad04 >> /lib/libc.so.1:0xbf3b4 >> /lib/libc.so.1:0xbf59c >> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_get_target_nodes+0x1cc >> [ Signal 11 (SEGV)] >> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_rmaps_round_robin.so:0x1ec8 >> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_rmaps_base_map_job+0xe4 >> /.../openmpi-1.6.2_32_cc/lib/libopen-rte.so.4.0.0:orte_plm_base_setup_job+0xc4 >> /.../openmpi-1.6.2_32_cc/lib/openmpi/mca_plm_rsh.so:orte_plm_rsh_launch+0x1b0 >> /.../openmpi-1.6.2_32_cc/bin/orterun:orterun+0x16a8 >> /.../openmpi-1.6.2_32_cc/bin/orterun:main+0x24 >> /.../openmpi-1.6.2_32_cc/bin/orterun:_start+0xd8 >> [tyr:23956] *** End of error message *** >> Segmentation fault >> >> Do you have any ideas or suggestions? As I wrote in my email from >> yesterday, I had to add "#include <math.h>" into file >> openmpi-1.6.2rc1/ompi/contrib/vt/vt/extlib/otf/tools/otfaux/otfaux.cpp >> to have a prototype for function "rint" in line 834. Thank you very >> much for any help in advance. >> >> >> Kind regards >> >> Siegmar >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > Did you compile OMPI usins '--with-pmi' option? I saw this error when you > have allocated job on one console, but running the test on another. Try to > run the task on the console where you allocated it. > By the way, is there any way to disable SLURM usage even OMPI was compiled > with it? If yes, what is the option? > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users