Hello Terry,

Finally, I have installed dbx. I enclose a file with the result that I get when I type "dbx - PID of mpirun..." and then "where" on computer 10.4.5.123 .
Do you have any idea what could be the problem?

Thanks a lot!!

Sofia



----- Original Message ----- From: "Terry Dontje" <terry.don...@sun.com>
To: <us...@open-mpi.org>
Sent: Wednesday, September 17, 2008 5:52 PM
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv



Date: Wed, 17 Sep 2008 16:23:59 +0200
From: "Sofia Aparicio Secanellas" <sapari...@grpss.ssr.upm.es>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
To: "Open MPI Users" <us...@open-mpi.org>
Message-ID: <0625EEFB84E04647A1930A963A8DF7E3@aparicio1>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response

Hello Terry,

Thank you very much for your help.


> Sofia,
>
> I took your program and actually ran it successfully on my systems > using Open MPI r19400. A couple questions:
>
> 1.  Have you tried to run the program on a single node?
> mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./PruebaSumaParalela.out
>


Yes. In this case, the program works perfectly.


> 2. Can you try and run the code the following way and is the output > different? > mpirun -np 2 --host 10.4.5.123,edu@10.4.5.126 --mca > mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>


The program also hangs but the output is different. In both computers I get the following:

Inicio
Inicio
totalnodes:2
mynode:0
Inicio Recv


Ok, so it looks like rank 1 is not getting out of MPI_Init
> 3. When the program hangs can you attach a debugger to one of the > processes and print out a stack?
>


I do not know how to do that.


With Solaris, I usually do the following:
% dbx - <pid of process>
dbx>  where
<stack prints out>

> 4. What version of Open MPI are you using, on what type of machine, > using which OS?
>


Openmpi-1.2.2 in both computers

In 10.4.5.123 I have:
Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008 i686 GNU/Linux

In edu@10.4.5.126 I have:
K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC 2007 i686 GNU/Linux


Sorry for the bonehead question but is edu@10.4.5.126 the actual machine name? Is its IP address really 10.4.5.126? Can you try that instead? I would guess the issue is that the tcp btl is somehow not matching the two nodes as being connected to each other.

--td
Sofia

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



No virus found in this incoming message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/

No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/

Attachment: result_dbx
Description: Binary data

Reply via email to