Hi All,
I'm having problems withe openmpi 1.4.1 and am receiving the following
error message when I try to run a test job.
[root@hydra ~]# mpirun -n 2 --prefix `dirname $MPILIBDIR` -v
-show-progress -machinefile ./nodes.to.use -pernode ./dml_test
--------------------------------------------------------------------------
It looks like opal_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
opal_paffinity_base_select failed
--> Returned value -13 instead of OPAL_SUCCESS
--------------------------------------------------------------------------
[hydra:10645] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 77
[hydra:10645] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
orterun.c at line 541
I have built openmpi with the following configure options
./configure --with-gm=/usr/local/gm
--prefix=/opt/apps/system/openmpi/1.4.1/intel
and it appears to build correctly, finds the right libraries and
generally doesn't have too much of a problem.
This was built on
Linux hydra 2.6.18-164.el5 #1 SMP Thu Sep 3 03:33:56 EDT 2009 i686 i686
i386 GNU/Linux
and after reading the docs, trawling the archives, I can't find much
that resembles the errors noted above.
Does anybody have any idea or pointers on where to look or what to debug?
Thanks and regards
David
--
David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005
(W) 08 8303 7301
(M) 0458 631 117