I've recently tried running OpenMPI v1.6.4 on multiple nodes and have noticed a 
change in behavior that I don't understand.  In OpenMPI version 1.4.x, 1.5.x 
and 1.6.1, I could run a job spanning two nodes as shown below.  The procedure 
results in 8 processes running on the first node and 8 on the second node.

mpirun -hostfile mpimachines -n 1 host.exe : -n 15 node.exe

where the file mpimachines looks  like:

node1 slots=8
node2 slots=8

In OpenMPI v1.6.2 and v1.6.4 (haven't tried v1.6.3) when I try to run the same 
way, all the processes start on node1 and none start on node2.  I've noticed 
there are now runtime flags -bynode and -byslot, but  I haven't had any success 
with those.  I've also tried changing the mpimachines file to look like:

node1 slots=8 max-slots=8
node2 slots=8 max-slots=8

When I tried this, I got a runtime error saying there were not enough slots in 
the system to satisfy the 15 slots that were requested by the application 
node.exe.  I think there is a hint in here about my problem, but I haven't been 
able to figure out what it is yet.

Can anyone let me know how the process has changed with these newer versions of 
OpenMPI?

Thanks,

Wallace




Reply via email to