Hello,
I would like to run an OpenMPI application on one node and since I think
it would be better performance wise I want it to use shared memory for
communication and not tcp. Is it possible to use shared memory not only
for MPI communication but also for control messages and other similar
inner MPI related communication? (so no tcp communication whatsoever is
used).
I came up with following parameters but I am receiving an error when I
use it:
mpirun --host localhost --mca btl sm,self --mca oob ^tcp -n 2 hello
It's running a simple hello world application. I know I don't have to
use the host parameter since by default it will run on localhost but
just to be on the safe side I included that too. I ask btl to use sm and
self (I guess "self" is compulsory) and instruct oob to not use tcp (per
the last lines in
http://www.open-mpi.org/faq/?category=tcp#tcp-selection ). Isn't this
correct?
Here's the exact error:
# mpirun --host localhost --mca btl sm,self --mca oob ^tcp -n 2 hello
[myhost:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init_stage1.c at line 182
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_rml_base_select failed
--> Returned value -13 instead of ORTE_SUCCESS
--------------------------------------------------------------------------
[peanutbutter:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_system_init.c at line 42
[peanutbutter:08491] [NO-NAME] ORTE_ERROR_LOG: Not found in file
runtime/orte_init.c at line 52
--------------------------------------------------------------------------
Open RTE was unable to initialize properly. The error occured while
attempting to orte_init(). Returned value -13 instead of ORTE_SUCCESS.
--------------------------------------------------------------------------