On Jul 15, 2006, at 2:58 PM, Eric Thibodeau wrote:
But, for some reason, on the Athlon node (in their image on the server I should say) OpenMPI still doesn't seem to be built correctly since it crashes as follows:


kyron@node0 ~ $ mpirun -np 1 uptime

Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)

Failing at addr:(nil)

[0] func:/home/kyron/openmpi_i686/lib/libopal.so.0 [0xb7f6258f]

[1] func:[0xffffe440]

[2] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init_stage1 +0x1d7) [0xb7fa0227]

[3] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_system_init +0x23) [0xb7fa3683]

[4] func:/home/kyron/openmpi_i686/lib/liborte.so.0(orte_init+0x5f) [0xb7f9ff7f]

[5] func:mpirun(orterun+0x255) [0x804a015]

[6] func:mpirun(main+0x22) [0x8049db6]

[7] func:/lib/tls/libc.so.6(__libc_start_main+0xdb) [0xb7de8f0b]

[8] func:mpirun [0x8049d11]

*** End of error message ***

Segmentation fault


The crash happens both in the chrooted env and on the nodes. I configured both systems to have Linux and POSIX threads, though I see openmpi is calling the POSIX version (a message on the mailling list had hinted on keeping the Linux threads around...I have to anyways since sone apps like Matlab extensions still depend on this...). The following is the output for the libc info.

That's interesting... We regularly build Open MPI on 32 bit Linux machines (and in 32 bit mode on Opteron machines) without too much issue. It looks like we're jumping into a NULL pointer, which generally means that a ORTE framework failed to initialize itself properly. It would be useful if you could rebuild with debugging symbols (just add -g to CFLAGS when configuring) and run mpirun in gdb. If we can determine where the error is occurring, that would definitely help in debugging your problem.

Brian


--
  Brian Barrett
  Open MPI developer
  http://www.open-mpi.org/


Reply via email to