Additionally, since you technically have a heterogeneous situation
(different OS versions on each node), you might want to:
- compile and install OMPI separately on each node (preferably in the
same filesystem location, though)
- compile and install your MPI app separately on each node (preferably
in the same filesystem location)
You *could* be seeing differences between libc on each node, etc.
On Sep 17, 2008, at 11:52 AM, Terry Dontje wrote:
Date: Wed, 17 Sep 2008 16:23:59 +0200
From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID: <0625EEFB84E04647A1930A963A8DF7E3@aparicio1>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response
Hello Terry,
Thank you very much for your help.
> Sofia,
>
> I took your program and actually ran it successfully on my
systems using > Open MPI r19400. A couple questions:
>
> 1. Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./
PruebaSumaParalela.out
>
Yes. In this case, the program works perfectly.
> 2. Can you try and run the code the following way and is the
output > different?
> mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca
mpi_preconnect_all > 1 --prefix /usr/local ./PruebaSumaParalela.out
>
The program also hangs but the output is different. In both
computers I get the following:
Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv
Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of
the > processes and print out a stack?
>
I do not know how to do that.
With Solaris, I usually do the following:
% dbx - <pid of process>
dbx> where
<stack prints out>
> 4. What version of Open MPI are you using, on what type of
machine, using > which OS?
>
Openmpi-1.2.2 in both computers
In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34
UTC 2008 i686 GNU/Linux
In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23
19:50:39 UTC 2007 i686 GNU/Linux
Sorry for the bonehead question but is edu@10.4.5.126 the actual
machine name? Is its IP address really 10.4.5.126? Can you try
that instead? I would guess the issue is that the tcp btl is
somehow not matching the two nodes as being connected to each other.
--td
Sofia
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Jeff Squyres
Cisco Systems