Dear Open-MPI list: I'm trying to run two (soon to be three) dual opteron machines as a cluster (network of workstations - they each have a disk and OS). I can ssh between machines with no password. My open-mpi code compiled fine and works great as an SMP program (using both processors on one machine). However, I am not able to run my open-mpi program parallel between the two computers.
For SMP work I use: mpirun -np 2 myprogram inputfile >outputfile For cluster work I have tried: mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile which does not write to the output file. I have also tried: mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile` which just ran serially on the initial machine. The open-mpi executable and libraries are on the head node NFS shared to the slave node. Both computers can run open-mpi [the open-mpi application] as an SMP program with no problems. When I am trying to run the open-mpi program with both computers, I am using a directory that is an NFS share to the other computer. I am running OpenSUSE 10.2 on both machines. I compiled with gcc 41 / ifort 9.1. I am using a gigabit network. My hostfile specifies slots=2 max-slots=2 for each computer. The computers are identified in the hostfile using the /etc/hosts alias. The only config.log that I found was in the directory I used to make open-mpi; since everything works as SMP, I am not including that file with this initial message. What should I be trying to do next to remedy this issue? Any help would be appreciated. Thanks, Mark Kosmowski