[OMPI users] How to debug Open MPI programs with gdb
Hello, I tried to debug with command: "mpirun -debugger gdb -debug -np 4 my_program" Surely, it starts the debugger, but it doesn't start the debugging nor it loads any of the my_program threads into the debugger. If I start debugging manually ("file my_program", then "run"), I can start only one thread of my_program. On the contrary when I debug with "mpirun -np 4 xterm -e gdb my_mpi_application" the four debugger windows are started with separate thread each, just as it should be. Since I will be using debugger on a remote computer I can only run gdb in console mode. Can anyone help me with this? Thank you in advance, Best regards, Nemanja Ilic
Re: [OMPI users] How to debug Open MPI programs with gdb
What version of OMPI are you using? On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote: > Hello, > > I tried to debug with command: "mpirun -debugger gdb -debug -np 4 my_program" > Surely, it starts the debugger, but it doesn't start the debugging nor it > loads any of the my_program threads into the debugger. If I start debugging > manually ("file my_program", then "run"), I can start only one thread of > my_program. > On the contrary when I debug with "mpirun -np 4 xterm -e gdb > my_mpi_application" the four debugger windows are started with separate > thread each, just as it should be. > Since I will be using debugger on a remote computer I can only run gdb in > console mode. Can anyone help me with this? > > Thank you in advance, > Best regards, > Nemanja Ilic > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to debug Open MPI programs with gdb
Hello, I am using Open MPI 1.4.1 Best regards, Nemanja Ilic On Thursday 22 April 2010 16:44:13 you wrote: > What version of OMPI are you using? > > On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote: > > > Hello, > > > > I tried to debug with command: "mpirun -debugger gdb -debug -np 4 > > my_program" > > Surely, it starts the debugger, but it doesn't start the debugging nor it > > loads any of the my_program threads into the debugger. If I start debugging > > manually ("file my_program", then "run"), I can start only one thread of > > my_program. > > On the contrary when I debug with "mpirun -np 4 xterm -e gdb > > my_mpi_application" the four debugger windows are started with separate > > thread each, just as it should be. > > Since I will be using debugger on a remote computer I can only run gdb in > > console mode. Can anyone help me with this? > > > > Thank you in advance, > > Best regards, > > Nemanja Ilic > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] kernel 2.6.23 vs 2.6.24 - communication/wait times
To keep this thread updated: After I posted to the developers list, the community was able to guide to a solution to the problem: http://www.open-mpi.org/community/lists/devel/2010/04/7698.php To sum up: The extended communication times while using shared memory communication of openmpi processes are caused by openmpi session directory laying on the network via NFS. The problem is resolved by establishing on each diskless node a ramdisk or mounting a tmpfs. By setting the MCA parameter orte_tmpdir_base to point to the according mountpoint shared memory communication and its files are kept local, thus decreasing the communication times by magnitudes. The relation of the problem to the kernel version is not really resolved, but maybe not "the problem" in this respect. My benchmark is now running fine on a single node with 4 CPU, kernel 2.6.33.1 and openmpi 1.4.1. Running on multiple nodes I experience still higher (TCP) communication times than I would expect. But that requires me some more deep researching the issue (e.g. collisions on the network) and should probably posted to a new thread. Thank you guys for your help. oli -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Re: [OMPI users] How to debug Open MPI programs with gdb
I don't think the "debugger" option to mpirun will work with gdb - I believe it is intended to work with parallel debuggers such as Totalview. Looking at the code, I certainly can't see how gdb would work correctly with it. There is an mpirun option "-xterm" which will launch the xterm windows automatically for you, but that doesn't resolve your problem as it basically does what you are doing manually. It is unclear why you can't run gdb this way on a remote computer. Is something wrong with xterm? Do you not have an xterm client running on your remote computer? On Apr 22, 2010, at 10:05 AM, Немања Илић (Nemanja Ilic) wrote: > Hello, > > I am using Open MPI 1.4.1 > > Best regards, > Nemanja Ilic > > > On Thursday 22 April 2010 16:44:13 you wrote: >> What version of OMPI are you using? >> >> On Apr 22, 2010, at 5:11 AM, Немања Илић (Nemanja Ilic) wrote: >> >>> Hello, >>> >>> I tried to debug with command: "mpirun -debugger gdb -debug -np 4 >>> my_program" >>> Surely, it starts the debugger, but it doesn't start the debugging nor it >>> loads any of the my_program threads into the debugger. If I start debugging >>> manually ("file my_program", then "run"), I can start only one thread of >>> my_program. >>> On the contrary when I debug with "mpirun -np 4 xterm -e gdb >>> my_mpi_application" the four debugger windows are started with separate >>> thread each, just as it should be. >>> Since I will be using debugger on a remote computer I can only run gdb in >>> console mode. Can anyone help me with this? >>> >>> Thank you in advance, >>> Best regards, >>> Nemanja Ilic >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] How to debug Open MPI programs with gdb
On Thu, 22 Apr 2010 13:11:49 +0200, "=?utf-8?b?0J3QtdC80LDRmtCwINCY0LvQuNGb?= (Nemanja Ilic)" wrote: > On the contrary when I debug with "mpirun -np 4 xterm -e gdb > my_mpi_application" the four debugger windows are started with > separate thread each, just as it should be. Since I will be using > debugger on a remote computer I can only run gdb in console mode. Can > anyone help me with this? An alternative to opening xterms (e.g. if that host isn't running an X server, you can't get X11 forwarding to work, or you just don't want xterms) is to use GNU "screen". It's basically the same command line, but it will open a screen terminal for each thread. When debugging multiple threads with xterms or screens, I recommend gdb's -ex 'break somewhere' -ex run --args ./app -args -for -your application to save you from entering commands into each terminal separately. Jed
Re: [OMPI users] How to debug Open MPI programs with gdb
You can run a X-windows server on your local machine and use the GUI for gdb, or use Eclipse Parallel Tools Platform (http://www.eclipse.org/ptp/) that has a debugger and turn on X-Forwarding in your Secure Shell client. -Original Message- From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of ?? (Nemanja Ilic) Sent: Thursday, April 22, 2010 6:12 AM To: Open MPI User List Subject: [OMPI users] How to debug Open MPI programs with gdb Hello, I tried to debug with command: "mpirun -debugger gdb -debug -np 4 my_program" Surely, it starts the debugger, but it doesn't start the debugging nor it loads any of the my_program threads into the debugger. If I start debugging manually ("file my_program", then "run"), I can start only one thread of my_program. On the contrary when I debug with "mpirun -np 4 xterm -e gdb my_mpi_application" the four debugger windows are started with separate thread each, just as it should be. Since I will be using debugger on a remote computer I can only run gdb in console mode. Can anyone help me with this? Thank you in advance, Best regards, Nemanja Ilic ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] Treatment of SIGHUP by mpirun
If a user connects to a cluster using SSH, starts an MPI program which contains an infinite loop, and then breaks the SSH connection, the processes running the MPI program continue to run on the compute nodes and they have to be killed manually. To investigate this, I found that if the user types Control-C (or sends SIGINT to mpirun), mpirun says "killing job...", and on each compute node the orted process and the process running the MPI program are killed. However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each compute node the orted process is killed but the process running the MPI program continues to run. This is with Rocks 5.3 and Open MPI. There is no batch scheduler. The MPI program is just: #include "mpi.h" main(int argc, char **argv) { MPI_Init(&argc, &argv); while (1) ; MPI_Finalize(); } I have two questions. Is this the behaviour I should expect? Is there an easy way to kill the processes? Thanks, Jon Hitchcock
Re: [OMPI users] Treatment of SIGHUP by mpirun
Which OMPI version? On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote: > If a user connects to a cluster using SSH, starts an MPI program which > contains an infinite loop, and then breaks the SSH connection, the processes > running the MPI program continue to run on the compute nodes and they have to > be killed manually. > > To investigate this, I found that if the user types Control-C (or sends > SIGINT to mpirun), mpirun says "killing job...", and on each compute node the > orted process and the process running the MPI program are killed. > > However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each > compute node the orted process is killed but the process running the MPI > program continues to run. > > This is with Rocks 5.3 and Open MPI. There is no batch scheduler. The MPI > program is just: > > #include "mpi.h" > main(int argc, char **argv) { >MPI_Init(&argc, &argv); >while (1) ; >MPI_Finalize(); > } > > I have two questions. Is this the behaviour I should expect? Is there an > easy way to kill the processes? > > Thanks, Jon Hitchcock > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Treatment of SIGHUP by mpirun
"mpirun -version" says "mpirun (Open MPI) 1.3.3" >>> Ralph Castain 22/04/2010 19:14:18 >>> Which OMPI version? On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote: > If a user connects to a cluster using SSH, starts an MPI program which > contains an infinite loop, and then breaks the SSH connection, the processes > running the MPI program continue to run on the compute nodes and they have to > be killed manually. > > To investigate this, I found that if the user types Control-C (or sends > SIGINT to mpirun), mpirun says "killing job...", and on each compute node the > orted process and the process running the MPI program are killed. > > However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each > compute node the orted process is killed but the process running the MPI > program continues to run. > > This is with Rocks 5.3 and Open MPI. There is no batch scheduler. The MPI > program is just: > > #include "mpi.h" > main(int argc, char **argv) { >MPI_Init(&argc, &argv); >while (1) ; >MPI_Finalize(); > } > > I have two questions. Is this the behaviour I should expect? Is there an > easy way to kill the processes? > > Thanks, Jon Hitchcock > > > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Treatment of SIGHUP by mpirun
Sounds like a bug - the processes should have died via SIGTERM, followed by SIGKILL. I know we had some problems in that regard, but I'm not sure if the fixes came into the 1.3.3 release or not. You might try updating to the 1.4.2rc1 tarball and see if that helps. I recently fixed a similar issue in the devel trunk, but that may not be related to this one as so much has changed in the devel area. On Apr 22, 2010, at 12:17 PM, Jon Hitchcock wrote: > "mpirun -version" says "mpirun (Open MPI) 1.3.3" > Ralph Castain 22/04/2010 19:14:18 >>> > Which OMPI version? > > On Apr 22, 2010, at 12:04 PM, Jon Hitchcock wrote: > >> If a user connects to a cluster using SSH, starts an MPI program which >> contains an infinite loop, and then breaks the SSH connection, the processes >> running the MPI program continue to run on the compute nodes and they have >> to be killed manually. >> >> To investigate this, I found that if the user types Control-C (or sends >> SIGINT to mpirun), mpirun says "killing job...", and on each compute node >> the orted process and the process running the MPI program are killed. >> >> However if SIGHUP is sent to mpirun, it says "Hangup" and exits, and on each >> compute node the orted process is killed but the process running the MPI >> program continues to run. >> >> This is with Rocks 5.3 and Open MPI. There is no batch scheduler. The MPI >> program is just: >> >> #include "mpi.h" >> main(int argc, char **argv) { >> MPI_Init(&argc, &argv); >> while (1) ; >> MPI_Finalize(); >> } >> >> I have two questions. Is this the behaviour I should expect? Is there an >> easy way to kill the processes? >> >> Thanks, Jon Hitchcock >> >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] program with MPI enabled subroutine
Hi, A fortran 90 code having MPI enabled subroutine is written. The subroutine part is given below, program abc .. !usual statements open(20, file='sum.20', action='write') open(30, file='sum.40', action='write') n2= 100;nstep=50 do step=1, nstep n1 = step sum2 = (n2 - n1 + 1) * (2*n1 + (n2 - n1 )) / 2 !from arithmetic progression call routine write(20, *) step, sum1, sum2 end do end program abc subroutine routine use dat !module 'dat' with common variables for both program & subroutine use mpi implicit none integer::ivar, istart, iend, sumt, i if(step.eq.1) call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world, irank, ierr) call mpi_comm_size(mpi_comm_world,np, ierr) ivar = (n2 - n1) / np + 1 istart= min(irank * ivar + 1, n2 + 1) iend = min(istart + ivar - 1, n2) sum1 = 0 do i=istart, iend sum1= sum1 + i end do call mpi_reduce(sum1, sumt, 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr) sum1 = sumt if(irank.eq.0) then write(30, *) step, sum1, sum2 end if if(step.eq.nstep) call mpi_finalize(ierr) end subroutine routine The current problem is that once the subroutine is called the data written to sum.30 and sum.20 are not matching. If there's no mistake with the calculation part, how shall it be possible to get the same data in both the files. I could see some of the 'sum1' values in sum.20 are not correct. I expect some good replies. Thanks in advance. Arunkumar
Re: [OMPI users] program with MPI enabled subroutine
Arunkumar C R wrote: Hi, A fortran 90 code having MPI enabled subroutine is written. The subroutine part is given below, program abc .. !usual statements open(20, file='sum.20', action='write') open(30, file='sum.40', action='write') n2= 100;nstep=50 do step=1, nstep n1 = step sum2 = (n2 - n1 + 1) * (2*n1 + (n2 - n1 )) / 2 !from arithmetic progression call routine write(20, *) step, sum1, sum2 end do end program abc subroutine routine use dat !module 'dat' with common variables for both program & subroutine use mpi implicit none integer::ivar, istart, iend, sumt, i if(step.eq.1) call mpi_init(ierr) call mpi_comm_rank(mpi_comm_world, irank, ierr) call mpi_comm_size(mpi_comm_world,np, ierr) ivar = (n2 - n1) / np + 1 istart= min(irank * ivar + 1, n2 + 1) iend = min(istart + ivar - 1, n2) sum1 = 0 do i=istart, iend sum1= sum1 + i end do call mpi_reduce(sum1, sumt, 1, mpi_integer, mpi_sum, 0, mpi_comm_world, ierr) sum1 = sumt if(irank.eq.0) then write(30, *) step, sum1, sum2 end if if(step.eq.nstep) call mpi_finalize(ierr) end subroutine routine The current problem is that once the subroutine is called the data written to sum.30 and sum.20 are not matching. If there's no mistake with the calculation part, how shall it be possible to get the same data in both the files. I could see some of the 'sum1' values in sum.20 are not correct. First, can you confirm that sum.30 is correct? You should be able to judge each output file independently, and not simply compare the two to each other. One of the problems with sum.20 is that it is being (over?)written by multiple processes. When you launch the (multi-process MPI) job with mpirun, you start multiple copies of the executable. So, multiple processes are opening the files and writing to sum.20. You need a statement in the main program like the "if(irank.eq.0)" conditional in the subroutine. I don't know what errors, exactly, you're seeing. So, I don't know if that addresses all of your problems. But this is certainly one of them.
[OMPI users] Not getting mpi.mod file from openmpi build and install
OpenMPI version: 1.3.3 Platform: IBM P5 If I configure with just the --prefix option for where to install, and then run "make all install", I get an "mpi.mod" file in the "lib" directory of the prefix directory I specified. ALL GOOD! When I add compiler options to the configure script (e.g. "CC=xlC_r CXX=xlC_r FC=xlf95_r"), I no longer get the "mpi.mod" file. I have traced this to the Makefile in the "ompi/mpi/f90" directory. When I add the compiler options to the configure script, a bunch of lines get commented out of the Makefile, including the ones that look like they build the "mpi.mod" file. Can anyone tell me why this happens? Thanks.