David Mathog wrote:
Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.
No. You can use both together. Indeed, the mpirun man page even has
examples of this (though personally, I don't see having a use for
this). I think the idea was you might use a hostfile to define the
nodes in your cluster and an mpirun command line that uses --host to
select specific nodes from the file.
That is reasonable, but it isn't clear that it is intended.
Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":
mpirun --host monkey01 --mca plm_rsh_agent rsh hostname
monkey01.cluster
Okay.
mpirun --host monkey02 --mca plm_rsh_agent rsh hostname
monkey02.cluster
Okay.
mpirun -hostfile /usr/common/etc/openmpi.machines.test1 \
--mca plm_rsh_agent rsh hostname
monkey02.cluster
Okay.
mpirun --host monkey01 \
-hostfile /usr/commom/etc/openmpi.machines.test1 \
--mca plm_rsh_agent rsh hostname
--------------------------------------------------------------------------
There are no allocated resources for the application
hostname
that match the requested mapping:
Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--------------------------------------------------------------------------
Right. Your hostfile has monkey02. On the command line, you specify
monkey01, but that's not in your hostfile. That's a problem. Just like
on the mpirun man page.