On Jul 15, 2006, at 2:58 PM, Eric Thibodeau wrote:
But, for some reason, on the Athlon node (in their image on the
server I should say) OpenMPI still doesn't seem to be built
correctly since it crashes as follows:
kyron@node0 ~ $ mpirun -np 1 uptime
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:(nil)
[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]
[1] func:[0xffffe440]
[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1
+0x1d7) [0xb7fa0227]
[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init
+0x23) [0xb7fa3683]
[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f)
[0xb7f9ff7f]
[5] func:mpirun(orterun+0x255) [0x804a015]
[6] func:mpirun(main+0x22) [0x8049db6]
[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]
[8] func:mpirun [0x8049d11]
*** End of error message ***
Segmentation fault
The crash happens both in the chrooted env and on the nodes. I
configured both systems to have Linux and POSIX threads, though I
see openmpi is calling the POSIX version (a message on the mailling
list had hinted on keeping the Linux threads around...I have to
anyways since sone apps like Matlab extensions still depend on
this...). The following is the output for the libc info.
That's interesting... We regularly build Open MPI on 32 bit Linux
machines (and in 32 bit mode on Opteron machines) without too much
issue. It looks like we're jumping into a NULL pointer, which
generally means that a ORTE framework failed to initialize itself
properly. It would be useful if you could rebuild with debugging
symbols (just add -g to CFLAGS when configuring) and run mpirun in
gdb. If we can determine where the error is occurring, that would
definitely help in debugging your problem.
Brian
--
Brian Barrett
Open MPI developer
http://www.open-mpi.org/