Dear Open-MPI list:

I'm trying to run two (soon to be three) dual opteron machines as a
cluster (network of workstations - they each have a disk and OS).  I
can ssh between machines with no password.  My open-mpi code compiled
fine and works great as an SMP program (using both processors on one
machine).  However, I am not able to run my open-mpi program parallel
between the two computers.

For SMP work I use:

mpirun -np 2 myprogram inputfile >outputfile

For cluster work I have tried:

mpirun --hostfile myhostfile -np 4 myprogram inputfile >outputfile

which does not write to the output file.

I have also tried:

mpirun --hostfile myhostfile -np 4 `myprogram inputfile >outputfile`

which just ran serially on the initial machine.

The open-mpi executable and libraries are on the head node NFS shared
to the slave node.  Both computers can run open-mpi [the open-mpi
application] as an SMP program with no problems.  When I am trying to
run the open-mpi program with both computers, I am using a directory
that is an NFS share to the other computer.

I am running OpenSUSE 10.2 on both machines.  I compiled with gcc 41 /
ifort 9.1.

I am using a gigabit network.

My hostfile specifies slots=2 max-slots=2 for each computer.  The
computers are identified in the hostfile using the /etc/hosts alias.

The only config.log that I found was in the directory I used to make
open-mpi; since everything works as SMP, I am not including that file
with this initial message.

What should I be trying to do next to remedy this issue?

Any help would be appreciated.

Thanks,

Mark Kosmowski

Reply via email to