It could also have been that you didn't have exactly matching installations on both machines. Even if they were the same version, if they weren't configured / installed the same way on both machines, this could have led to problems. Also be sure that either the MPI application is compatible / runnable on both systems or you have a separate MPI application binary for each system (e.g., to account for glibc and other differences between your two OS's).

Running in heterogeneous situations like that is quite difficult to do, and not for the meek. :-)


On Jun 13, 2008, at 2:12 AM, Manuel Freiberger wrote:

Hello,

Well, actually I'm quite sure that it was not the firewall because I had to
turn it off as otherwise no connection could be established. So my
 iptables --list
returns

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

on both machines. After reinstalling OMPI, I did not make any changes to the firewall but now it works without problems. Probably installing the library with exactly the same configuration (same --prefix and so on) did the trick.

But nonetheless, thank you very much for your hint! :-)

Best regards,
Manuel

On Thursday 12 June 2008 18:23, Rainer Keller wrote:
Hi,
are You sure it was not a Firewall issue on the Suse 10.2?
If there are any connections from the Gentoo machine trying to access the
orted on the Suse, check in /var/log/firewall.

For the time being, try stopping the firewall by (as root) with
/etc/init.d/SuSEfirewall2_setup stop
and test whether it works ,-]

With best regards,
Rainer

On Donnerstag, 12. Juni 2008, Manuel Freiberger wrote:
Hi!

Ok, I found the problem. I reinstallen OMPI on both PCs but this time
only locally in the users home directory. Now, the sample code works
perfectly. I'm not sure where the error really was located. It could be that it was a problem with the Gentoo installation because OMPI is still
marked unstable in portage (~x86 keyword).

Best regards,
Manuel

On Wednesday 11 June 2008 18:52, Manuel Freiberger wrote:
Hello everybody!

First of all I wanted to point out that I'm beginner regarding openMPI and all I try to achieve is to get a simple program working on two PCs. So far I've installed openMPI 1.2.6 on two PCs (one running OpenSUSE
10.2, the other one Gentoo).
I set up two identical users on both systems and made sure that I can
make an SSH connection between them using private/public key
authentication.

Next I ran the command
 mpirun -np 2 --hostfile myhosts uptime
which gave the result
 6:41pm  up 1 day  5:16,  4 users,  load average: 0.00, 0.07, 0.17
18:43:45 up  7:36,  6 users,  load average: 0.00, 0.02, 0.05
so I concluded that MPI should work in principle.

Next I tried the following code which I copied from Boost.MPI:
---- snip
#include <mpi.h>
#include <iostream>

int main(int argc, char* argv[])
{
 MPI_Init(&argc, &argv);
 int rank;
 MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 if (rank == 0)
 {
   std::cout << "Rank 0 is going to send" << std::endl;
   int value = 17;
   int result = MPI_Send(&value, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
   if (result == MPI_SUCCESS)
     std::cout << "Rank 0 OK!" << std::endl;
 }
 else if (rank == 1)
 {
   std::cout << "Rank 1 is waiting for answer" << std::endl;
   int value;
   MPI_Status status;
   int result = MPI_Recv(&value, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
                          &status);
   if (result == MPI_SUCCESS && value == 17)
     std::cout << "Rank 1 OK!" << std::endl;
 }
 MPI_Finalize();
 return 0;
}
---- snap

Starting a parallel job using
 mpirun -np 2 --hostfile myhosts mpi-test
I get the answer
 Rank 0 is going to send
 Rank 1 is waiting for answer
 Rank 0 OK!
and than the program locks. So the strange thing is that obviously the
recv()-command is blocking, which is what I do not understand.

Could anybody provide some hints, where I should start looking for the
mistake? Any help is welcome!

Best regards,
Manuel

--
Manuel Freiberger
Institute of Medical Engineering
Graz University of Technology
manuel.freiber...@tugraz.at
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to