[OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Немања Илић (Nemanja Ilic)
Hello,

I tried to debug with command: "mpirun -debugger gdb -debug -np 4 my_program"
Surely,  it starts the debugger, but it doesn't start the debugging nor it 
loads any of the my_program threads into the debugger. If I start debugging 
manually ("file my_program", then "run"), I can start only one thread of 
my_program.
On the contrary when I debug with "mpirun -np 4 xterm -e gdb 
my_mpi_application" the four debugger windows are started with separate thread 
each, just as it should be.
Since I will be using debugger on a remote computer I can only run gdb in 
console mode. Can anyone help me with this?

Thank you in advance,
Best regards,
Nemanja Ilic


Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Ralph Castain
What version of OMPI are you using?

On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote:

> Hello,
> 
> I tried to debug with command: "mpirun -debugger gdb -debug -np 4 my_program"
> Surely,  it starts the debugger, but it doesn't start the debugging nor it 
> loads any of the my_program threads into the debugger. If I start debugging 
> manually ("file my_program", then "run"), I can start only one thread of 
> my_program.
> On the contrary when I debug with "mpirun -np 4 xterm -e gdb 
> my_mpi_application" the four debugger windows are started with separate 
> thread each, just as it should be.
> Since I will be using debugger on a remote computer I can only run gdb in 
> console mode. Can anyone help me with this?
> 
> Thank you in advance,
> Best regards,
> Nemanja Ilic
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Немања Илић (Nemanja Ilic)
Hello,

I am using Open MPI 1.4.1

Best regards,
Nemanja Ilic


On Thursday 22 April 2010 16:44:13 you wrote:
> What version of OMPI are you using?
> 
> On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote:
> 
> > Hello,
> > 
> > I tried to debug with command: "mpirun -debugger gdb -debug -np 4 
> > my_program"
> > Surely,  it starts the debugger, but it doesn't start the debugging nor it 
> > loads any of the my_program threads into the debugger. If I start debugging 
> > manually ("file my_program", then "run"), I can start only one thread of 
> > my_program.
> > On the contrary when I debug with "mpirun -np 4 xterm -e gdb 
> > my_mpi_application" the four debugger windows are started with separate 
> > thread each, just as it should be.
> > Since I will be using debugger on a remote computer I can only run gdb in 
> > console mode. Can anyone help me with this?
> > 
> > Thank you in advance,
> > Best regards,
> > Nemanja Ilic
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 





Re: [OMPI users] kernel 2.6.23 vs 2.6.24 - communication/wait times

2010-04-22 Thread Oliver Geisler
To keep this thread updated:

After I posted to the developers list, the community was able to guide
to a solution to the problem:
http://www.open-mpi.org/community/lists/devel/2010/04/7698.php

To sum up:

The extended communication times while using shared memory communication
of openmpi processes are caused by openmpi session directory laying on
the network via NFS.

The problem is resolved by establishing on each diskless node a ramdisk
or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to
point to the according mountpoint shared memory communication and its
files are kept local, thus decreasing the communication times by magnitudes.

The relation of the problem to the kernel version is not really
resolved, but maybe not "the problem" in this respect.
My benchmark is now running fine on a single node with 4 CPU, kernel
2.6.33.1 and openmpi 1.4.1.
Running on multiple nodes I experience still higher (TCP) communication
times than I would expect. But that requires me some more deep
researching the issue (e.g. collisions on the network) and should
probably posted to a new thread.

Thank you guys for your help.

oli



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.



Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Ralph Castain
I don't think the "debugger" option to mpirun will work with gdb - I believe it 
is intended to work with parallel debuggers such as Totalview. Looking at the 
code, I certainly can't see how gdb would work correctly with it.

There is an mpirun option "-xterm" which will launch the xterm windows 
automatically for you, but that doesn't resolve your problem as it basically 
does what you are doing  manually.

It is unclear why you can't run gdb this way on a remote computer. Is something 
wrong with xterm? Do you not have an xterm client running on your remote 
computer?

On Apr 22, 2010, at 10:05 AM, Немања Илић (Nemanja Ilic) wrote:

> Hello,
> 
> I am using Open MPI 1.4.1
> 
> Best regards,
> Nemanja Ilic
> 
> 
> On Thursday 22 April 2010 16:44:13 you wrote:
>> What version of OMPI are you using?
>> 
>> On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote:
>> 
>>> Hello,
>>> 
>>> I tried to debug with command: "mpirun -debugger gdb -debug -np 4 
>>> my_program"
>>> Surely,  it starts the debugger, but it doesn't start the debugging nor it 
>>> loads any of the my_program threads into the debugger. If I start debugging 
>>> manually ("file my_program", then "run"), I can start only one thread of 
>>> my_program.
>>> On the contrary when I debug with "mpirun -np 4 xterm -e gdb 
>>> my_mpi_application" the four debugger windows are started with separate 
>>> thread each, just as it should be.
>>> Since I will be using debugger on a remote computer I can only run gdb in 
>>> console mode. Can anyone help me with this?
>>> 
>>> Thank you in advance,
>>> Best regards,
>>> Nemanja Ilic
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Jed Brown
On Thu, 22 Apr 2010 13:11:49 +0200, "=?utf-8?b?0J3QtdC80LDRmtCwINCY0LvQuNGb?= 
(Nemanja Ilic)"  wrote:
> On the contrary when I debug with "mpirun -np 4 xterm -e gdb
> my_mpi_application" the four debugger windows are started with
> separate thread each, just as it should be.  Since I will be using
> debugger on a remote computer I can only run gdb in console mode. Can
> anyone help me with this?

An alternative to opening xterms (e.g. if that host isn't running an X
server, you can't get X11 forwarding to work, or you just don't want
xterms) is to use GNU "screen".  It's basically the same command line,
but it will open a screen terminal for each thread.  When debugging
multiple threads with xterms or screens, I recommend gdb's -ex 'break
somewhere' -ex run --args ./app -args -for -your application to save you
from entering commands into each terminal separately.

Jed


Re: [OMPI users] How to debug Open MPI programs with gdb

2010-04-22 Thread Trent Creekmore
You can run a X-windows server on your local machine and use the GUI for
gdb, or use Eclipse Parallel Tools Platform (http://www.eclipse.org/ptp/)
that has a debugger and turn on X-Forwarding in your Secure Shell client.



-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of ??  (Nemanja Ilic)
Sent: Thursday, April 22, 2010 6:12 AM
To: Open MPI User List
Subject: [OMPI users] How to debug Open MPI programs with gdb

Hello,

I tried to debug with command: "mpirun -debugger gdb -debug -np 4
my_program"
Surely,  it starts the debugger, but it doesn't start the debugging nor it
loads any of the my_program threads into the debugger. If I start debugging
manually ("file my_program", then "run"), I can start only one thread of
my_program.
On the contrary when I debug with "mpirun -np 4 xterm -e gdb
my_mpi_application" the four debugger windows are started with separate
thread each, just as it should be.
Since I will be using debugger on a remote computer I can only run gdb in
console mode. Can anyone help me with this?

Thank you in advance,
Best regards,
Nemanja Ilic
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Treatment of SIGHUP by mpirun

2010-04-22 Thread Jon Hitchcock
If a user connects to a cluster using SSH, starts an MPI program which contains 
an infinite loop, and then breaks the SSH connection, the processes running the 
MPI program continue to run on the compute nodes and they have to be killed 
manually.

To investigate this, I found that if the user types Control-C (or sends SIGINT 
to mpirun), mpirun says "killing job...", and on each compute node the orted 
process and the process running the MPI program are killed.

However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each 
compute node the orted process is killed but the process running the MPI 
program continues to run.

This is with Rocks 5.3 and Open MPI.  There is no batch scheduler.  The MPI 
program is just:

#include "mpi.h"
main(int argc, char **argv) {
MPI_Init(&argc, &argv);
while (1) ;
MPI_Finalize();
}

I have two questions.  Is this the behaviour I should expect?  Is there an easy 
way to kill the processes? 

Thanks, Jon Hitchcock







Re: [OMPI users] Treatment of SIGHUP by mpirun

2010-04-22 Thread Ralph Castain
Which OMPI version?

On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote:

> If a user connects to a cluster using SSH, starts an MPI program which 
> contains an infinite loop, and then breaks the SSH connection, the processes 
> running the MPI program continue to run on the compute nodes and they have to 
> be killed manually.
> 
> To investigate this, I found that if the user types Control-C (or sends 
> SIGINT to mpirun), mpirun says "killing job...", and on each compute node the 
> orted process and the process running the MPI program are killed.
> 
> However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each 
> compute node the orted process is killed but the process running the MPI 
> program continues to run.
> 
> This is with Rocks 5.3 and Open MPI.  There is no batch scheduler.  The MPI 
> program is just:
> 
> #include "mpi.h"
> main(int argc, char **argv) {
>MPI_Init(&argc, &argv);
>while (1) ;
>MPI_Finalize();
> }
> 
> I have two questions.  Is this the behaviour I should expect?  Is there an 
> easy way to kill the processes? 
> 
> Thanks, Jon Hitchcock
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Treatment of SIGHUP by mpirun

2010-04-22 Thread Jon Hitchcock
"mpirun -version" says "mpirun (Open MPI) 1.3.3"

>>> Ralph Castain  22/04/2010 19:14:18 >>>
Which OMPI version?

On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote:

> If a user connects to a cluster using SSH, starts an MPI program which 
> contains an infinite loop, and then breaks the SSH connection, the processes 
> running the MPI program continue to run on the compute nodes and they have to 
> be killed manually.
> 
> To investigate this, I found that if the user types Control-C (or sends 
> SIGINT to mpirun), mpirun says "killing job...", and on each compute node the 
> orted process and the process running the MPI program are killed.
> 
> However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each 
> compute node the orted process is killed but the process running the MPI 
> program continues to run.
> 
> This is with Rocks 5.3 and Open MPI.  There is no batch scheduler.  The MPI 
> program is just:
> 
> #include "mpi.h"
> main(int argc, char **argv) {
>MPI_Init(&argc, &argv);
>while (1) ;
>MPI_Finalize();
> }
> 
> I have two questions.  Is this the behaviour I should expect?  Is there an 
> easy way to kill the processes? 
> 
> Thanks, Jon Hitchcock
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users 


___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Treatment of SIGHUP by mpirun

2010-04-22 Thread Ralph Castain
Sounds like a bug - the processes should have died via SIGTERM, followed by 
SIGKILL. I know we had some problems in that regard, but I'm not sure if the 
fixes came into the 1.3.3 release or not.

You might try updating to the 1.4.2rc1 tarball and see if that helps. I 
recently fixed a similar issue in the devel trunk, but that may not be related 
to this one as so much has changed in the devel area.


On Apr 22, 2010, at 12:17 PM, Jon Hitchcock wrote:

> "mpirun -version" says "mpirun (Open MPI) 1.3.3"
> 
 Ralph Castain  22/04/2010 19:14:18 >>>
> Which OMPI version?
> 
> On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote:
> 
>> If a user connects to a cluster using SSH, starts an MPI program which 
>> contains an infinite loop, and then breaks the SSH connection, the processes 
>> running the MPI program continue to run on the compute nodes and they have 
>> to be killed manually.
>> 
>> To investigate this, I found that if the user types Control-C (or sends 
>> SIGINT to mpirun), mpirun says "killing job...", and on each compute node 
>> the orted process and the process running the MPI program are killed.
>> 
>> However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each 
>> compute node the orted process is killed but the process running the MPI 
>> program continues to run.
>> 
>> This is with Rocks 5.3 and Open MPI.  There is no batch scheduler.  The MPI 
>> program is just:
>> 
>> #include "mpi.h"
>> main(int argc, char **argv) {
>>   MPI_Init(&argc, &argv);
>>   while (1) ;
>>   MPI_Finalize();
>> }
>> 
>> I have two questions.  Is this the behaviour I should expect?  Is there an 
>> easy way to kill the processes? 
>> 
>> Thanks, Jon Hitchcock
>> 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org 
>> http://www.open-mpi.org/mailman/listinfo.cgi/users 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] program with MPI enabled subroutine

2010-04-22 Thread Arunkumar C R
Hi,

A fortran  90 code having  MPI enabled subroutine is written. The subroutine
part is given below,

program abc
..   !usual statements
   open(20, file='sum.20', action='write')
   open(30, file='sum.40', action='write')
   n2= 100;nstep=50
   do step=1, nstep
  n1 = step
  sum2 = (n2 - n1 + 1) * (2*n1 + (n2 - n1 )) /
2 !from arithmetic progression
  call routine
  write(20, *) step, sum1, sum2
end do
end program abc


subroutine routine
use
dat
!module 'dat' with common variables for both program & subroutine
use mpi
implicit none
integer::ivar, istart, iend, sumt, i

if(step.eq.1) call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, irank, ierr)
call mpi_comm_size(mpi_comm_world,np, ierr)
ivar  = (n2 - n1)  /  np + 1
istart= min(irank  * ivar + 1,   n2 + 1)
iend  = min(istart + ivar - 1,   n2)
sum1  = 0
do i=istart, iend
sum1= sum1 + i
end do

call mpi_reduce(sum1, sumt, 1, mpi_integer, mpi_sum, 0,
mpi_comm_world, ierr)
sum1  = sumt
if(irank.eq.0) then
  write(30, *) step, sum1, sum2
end if
if(step.eq.nstep) call mpi_finalize(ierr)
end subroutine routine

The current problem is that once the subroutine is called the data written
to sum.30 and sum.20 are not matching.
If there's no mistake with the calculation part, how shall it be possible to
get the same data in both the files. I could
see some of the 'sum1' values in sum.20 are not correct.

I expect some good replies.

Thanks in advance.

Arunkumar


Re: [OMPI users] program with MPI enabled subroutine

2010-04-22 Thread Eugene Loh

Arunkumar C R wrote:


Hi,

A fortran  90 code having  MPI enabled subroutine is written. The 
subroutine part is given below,


program abc
..   !usual statements
   open(20, file='sum.20', action='write')
   open(30, file='sum.40', action='write')
   n2= 100;nstep=50
   do step=1, nstep
  n1 = step
  sum2 = (n2 - n1 + 1) * (2*n1 + (n2 - n1 )) / 
2 !from arithmetic progression

  call routine
  write(20, *) step, sum1, sum2
end do
end program abc


subroutine routine
use 
dat
!module 'dat' with common variables for both program & subroutine

use mpi
implicit none
integer::ivar, istart, iend, sumt, i

if(step.eq.1) call mpi_init(ierr)
call mpi_comm_rank(mpi_comm_world, irank, ierr)
call mpi_comm_size(mpi_comm_world,np, ierr)
ivar  = (n2 - n1)  /  np + 1
istart= min(irank  * ivar + 1,   n2 + 1)
iend  = min(istart + ivar - 1,   n2)
sum1  = 0
do i=istart, iend
sum1= sum1 + i
end do

call mpi_reduce(sum1, sumt, 1, mpi_integer, mpi_sum, 0, 
mpi_comm_world, ierr)

sum1  = sumt
if(irank.eq.0) then
  write(30, *) step, sum1, sum2
end if
if(step.eq.nstep) call mpi_finalize(ierr)
end subroutine routine

The current problem is that once the subroutine is called the data 
written to sum.30 and sum.20 are not matching.
If there's no mistake with the calculation part, how shall it be 
possible to get the same data in both the files. I could

see some of the 'sum1' values in sum.20 are not correct.


First, can you confirm that sum.30 is correct?  You should be able to 
judge each output file independently, and not simply compare the two to 
each other.


One of the problems with sum.20 is that it is being (over?)written by 
multiple processes.  When you launch the (multi-process MPI) job with 
mpirun, you start multiple copies of the executable.  So, multiple 
processes are opening the files and writing to sum.20.  You need a 
statement in the main program like the "if(irank.eq.0)" conditional in 
the subroutine.


I don't know what errors, exactly, you're seeing.  So, I don't know if 
that addresses all of your problems.  But this is certainly one of them.


[OMPI users] Not getting mpi.mod file from openmpi build and install

2010-04-22 Thread Price, Brian M (N-KCI)
OpenMPI version: 1.3.3
Platform: IBM P5

If I configure with just the --prefix option for where to install, and then run 
"make all install", I get an "mpi.mod" file in the "lib" directory of the 
prefix directory I specified.  ALL GOOD!

When I add compiler options to the configure script (e.g. "CC=xlC_r CXX=xlC_r 
FC=xlf95_r"), I no longer get the "mpi.mod" file.

I have traced this to the Makefile in the "ompi/mpi/f90" directory.  When I add 
the compiler options to the configure script, a bunch of lines get commented 
out of the Makefile, including the ones that look like they build the "mpi.mod" 
file.

Can anyone tell me why this happens?

Thanks.