Hi everybody!
I try to get OpenMPI and Globus to cooperate. These are the steps i
executed in order to get OpenMPI working:
1. export PATH=/opt/openmpi/bin/:$PATH
2. /opt/globus/setup/globus/setup-globus-job-manager-fork
checking for mpiexec... /opt/openmpi/bin//mpiexec
checking for mpirun... /opt/openmpi/bin//mpirun
find-fork-tools: creating ./config.status
config.status: creating fork.pm
3. restart VDT (includes GRAM, WSGRAM, mysql, rls...)
As you can see the necessary OpenMPI-executables are recognized
correctly by setup-globus-job-manager-fork. But when i actually try to
execute a simple mpi-program using globus-job-run i get this:
globus-job-run localhost -x '(jobType=mpi)' -np 2 -s ./hypercube 0
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file
runtime/orte_init_stage1.c at line 312
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_pls_base_select failed
--> Returned value -1 instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file
runtime/orte_system_init.c at line 42
[hydra:10168] [0,0,0] ORTE_ERROR_LOG: Error in file runtime/orte_init.c
at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly. The error occured while
attempting to orte_init(). Returned value -1 instead of ORTE_SUCCESS.
--------------------------------------------------------------------------
The MPI-program itself is okey:
which mpirun && mpirun -np 2 hypercube 0
/opt/openmpi/bin/mpirun
Process 0 received broadcast message 'MPI_Broadcast with hypercube
topology' from Process 0
Process 1 received broadcast message 'MPI_Broadcast with hypercube
topology' from Process 0
From what i read in the mailing list i think that something is wrong
with the pls and globus. But i have no idea what could be wrong not to
speak of how it could be fixed ;). so if someone would have an idea how
this could be fixed, i'd be glad to hear it.
Regards,
Christoph