[OMPI users] mpirun related
Hi, mpirun internally uses ssh to launch a program on multiple nodes. I would like to see the various parameters that are sent to each of the nodes. How can I do this? -chev
[OMPI users] mutex deadlock in btl tcp
Dear Open MPI users list, From time to time, I experience a mutex deadlock in Open-MPI 1.1.2. The stack trace is available at the end of the mail. The deadlock seems to be caused by lines 118 & 119 of the ompi/mca/btl/tcp/btl_tcp.c file, in function mca_btl_tcp_add_procs: OBJ_RELEASE(tcp_endpoint); OPAL_THREAD_UNLOCK(&tcp_proc->proc_lock); (of course, I did not check whether line numbers have changed since 1.1.2.) Indeed, releasing tcp_endpoint causes a call to mca_btl_tcp_proc_remove that attempts to acquire the mutex tcp_proc->proc_lock, which is already held by the thread (OBJ_THREAD_LOCK(&tcp_proc->proc_lock) at line 103 of the ompi/mca/btl/tcp/btl_tcp.c file). Switching the two lines above (ie releasing the mutex before destructing tcp_endpoint) seems to be sufficient to fix the deadlock. Maybe should the changes done in the mca_btl_tcp_proc_insert function be reverted rather than releasing the mutex before tcp_endpoint? As far as I looked, the problem seems to still appear in the trunk revision 13359. Second point. Is there any reason why MPI_Comm_spawn is restricted to execute the new process(es) only on hosts listed in either the --host option or in the hostfile? Or did I miss something? Best regards, Jeremy -- stack trace as dumped by open-mpi (gdb version follows): opal_mutex_lock(): Resource deadlock avoided Signal:6 info.si_errno:0(Success) si_code:-6() [0] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libopal.so.0 [0x8addeb] [1] func:/lib/tls/libpthread.so.0 [0x176e40] [2] func:/lib/tls/libc.so.6(abort+0x1d5) [0xa294e5] [3] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65f8a3] [4] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_proc_remove+0x2a) [0x65fff0] [5] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65cb24] [6] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x659465] [7] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so(mca_btl_tcp_add_procs+0x10f) [0x65927b] [8] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_bml_r2.so(mca_bml_r2_add_procs+0x1bb) [0x628023] [9] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd6) [0x61650b] [10] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(ompi_comm_get_rport+0x1f8) [0xb82303] [11] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(ompi_comm_connect_accept+0xbb) [0xb81b43] [12] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0(PMPI_Comm_spawn+0x3de) [0xbb671a] [13] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0+0x3d2) [0x804bb8e] [14] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bdff] [15] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bfd4] [16] func:/lib/tls/libc.so.6(__libc_start_main+0xda) [0xa1578a] [17] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0+0x75) [0x804b831] *** End of error message *** Same stack, dumped by gdb: #0 0x00176357 in __pause_nocancel () from /lib/tls/libpthread.so.0 #1 0x008ade9b in opal_show_stackframe (signo=6, info=0xbfff9290, p=0xbfff9310) at stacktrace.c:306 #2 #3 0x00a27cdf in raise () from /lib/tls/libc.so.6 #4 0x00a294e5 in abort () from /lib/tls/libc.so.6 #5 0x0065f8a3 in opal_mutex_lock (m=0x8ff8250) at ../../../../opal/threads/mutex_unix.h:104 #6 0x0065fff0 in mca_btl_tcp_proc_remove (btl_proc=0x8ff8220, btl_endpoint=0x900eba0) at btl_tcp_proc.c:296 #7 0x0065cb24 in mca_btl_tcp_endpoint_destruct (endpoint=0x900eba0) at btl_tcp_endpoint.c:99 #8 0x00659465 in opal_obj_run_destructors (object=0x900eba0) at ../../../../opal/class/opal_object.h:405 #9 0x0065927b in mca_btl_tcp_add_procs (btl=0x8e57c30, nprocs=1, ompi_procs=0x8ff7ac8, peers=0x8ff7ad8, reachable=0xbfff98e4) at btl_tcp.c:118 #10 0x00628023 in mca_bml_r2_add_procs (nprocs=1, procs=0x8ff7ac8, bml_endpoints=0x8ff60b8, reachable=0xbfff98e4) at bml_r2.c:231 #11 0x0061650b in mca_pml_ob1_add_procs (procs=0xbfff9930, nprocs=1) at pml_ob1.c:133 #12 0x00b82303 in ompi_comm_get_rport (port=0x0, send_first=0, proc=0x8e51c70, tag=2000) at communicator/comm_dyn.c:305 #13 0x00b81b43 in ompi_comm_connect_accept (comm=0x8ff8ce0, root=0, port=0x0, send_first=0, newcomm=0xbfff9a38, tag=2000) at communicator/comm_dyn.c:85 #14 0x00bb671a in PMPI_Comm_spawn (command=0x8ff88f0 "/home1/jbuisson/target/bin/sample-npb-ft-pp", argv=0xbfff9b40, maxprocs=1, info=0x8ff73e0, root=0, comm=0x8ff8ce0, intercomm=0xbfff9aa4, array_of_errcodes=0x0) at pcomm_spawn.c:110 signature.asc Description: OpenPGP digital signature
Re: [OMPI users] mpirun related
On Mon, Jan 29, 2007 at 10:49:10PM -0800, Chevchenkovic Chevchenkovic wrote: > Hi, Hi > mpirun internally uses ssh to launch a program on multiple nodes. > I would like to see the various parameters that are sent to each of > the nodes. How can I do this? You mean adding "pls_rsh_debug=1" to your ~/.openmpi/mca-params.conf? HTH -- Cluster and Metacomputing Working Group Friedrich-Schiller-Universität Jena, Germany private: http://adi.thur.de
Re: [OMPI users] Scrambled communications using ssh starter onmultiple nodes.
The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while waiting for requests from slave processes. The slaves sometimes use MPI_ANY_TAG but the source is always specified. We have run the code through valgrid for a number of cases including the one being used here. The code is Fortran 90 and we are using the FORTRAN 77 interface so I do not believe this is a problem. We are using Gigabit Ethernet. I could look at LAM again to see if it would work. The code needs to be in a specific working directory and we need some environment variable set. This was not supported well in pre MPI 2. versions of MPI. For MPICH1 I actually launch a script for the slaves so that we have the proper setup before running the executable. Note I had tried that with OpenMPI and it had an internal error in orterun. This is not a problem since the mpirun can setup everything we need. If you think it is worth while I will download and try it. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Monday, January 29, 2007 7:54 PM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starter onmultiple nodes. Without analyzing your source, it's hard to say. I will say that OMPI may send fragments out of order, but we do, of course, provide the same message ordering guarantees that MPI mandates. So let me ask a few leading questions: - Are you using any wildcards in your receives, such as MPI_ANY_SOURCE or MPI_ANY_TAG? - Have you run your code through a memory-checking debugger such as valgrind? - I don't know what Scali MPI uses, but MPICH and Intel MPI use integers for MPI handles. Have you tried LAM/MPI as well? It, like Open MPI, uses pointers for MPI handles. I mention this because apps that unintentionally have code that takes advantage of integer handles can sometimes behave unpredictably when switching to a pointer-based MPI implementation. - What network interconnect are you using between the two hosts? On Jan 25, 2007, at 4:22 PM, Fisher, Mark S wrote: > Recently I wanted to try OpenMPI for use with our CFD flow solver > WINDUS. The code uses a master/slave methodology were the master > handles I/O and issues tasks for the slaves to perform. The original > parallel implementation was done in 1993 using PVM and in 1999 we > added support for MPI. > > When testing the code with Openmpi 1.1.2 it ran fine when running on a > single machine. As soon as I ran on more than one machine I started > getting random errors right away (the attached tar ball has a good and > bad output). It looked like either the messages were out of order or > were for the other slave process. In the run mode used there is no > slave to slave communication. In the file the code died near the > beginning of the communication between master and slave. Sometimes it > will run further before it fails. > > I have included a tar file with the build and configuration info. The > two nodes are identical Xeon 2.8 GHZ machines running SLED 10. I am > running real-time (no queue) using the ssh starter using the following > appt file. > > -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host > skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ > __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent > /usr/bin/ssh --host copland -wdir /tmp/mpi.m209290 -np 2 > ./__bcfdbeta.exe > > The above file fails but the following works: > > -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host > skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ > __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent > /usr/bin/ssh --host > skipper2 -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe > > The first process is the master and the second two are the slaves. > I am > not sure what is going wrong, the code runs fine with many other MPI > distributions (MPICH1/2, Intel, Scali...). I assume that either I > built it wrong or am not running it properly but I cannot see what I > am doing wrong. Any help would be appreciated! > > <> > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Scrambled communications using ssh starter onmultiple nodes.
On Jan 30, 2007, at 9:35 AM, Fisher, Mark S wrote: The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while waiting for requests from slave processes. The slaves sometimes use MPI_ANY_TAG but the source is always specified. I think you said that you only had corruption issues on the slave, right? If so, the ANY_SOURCE/ANY_TAG on the master probably aren't the issue. But if you're doing ANY_TAG on the slaves, you might want to double check that that code is doing exactly what you think it's doing. Are there any race conditions such that a message could be received on that ANY_TAG that you did not intend to receive there? Look especially hard at non-blocking receives with ANY_TAG. We have run the code through valgrid for a number of cases including the one being used here. Excellent. The code is Fortran 90 and we are using the FORTRAN 77 interface so I do not believe this is a problem. Agreed; should not be an issue. We are using Gigabit Ethernet. Ok, good. I could look at LAM again to see if it would work. The code needs to be in a specific working directory and we need some environment variable set. This was not supported well in pre MPI 2. versions of MPI. For MPICH1 I actually launch a script for the slaves so that we have the proper setup before running the executable. Note I had tried that with OpenMPI and it had an internal error in orterun. This is not a problem Really? OMPI's mpirun does not depend on the executable being an MPI application -- indeed, you can "mpirun -np 2 uptime" with no problem. What problem did you run into here? since the mpirun can setup everything we need. If you think it is worth while I will download and try it. From what you describe, it sounds like order of messaging may be the issue, not necessarily MPI handle types. So let's hold off on that one for the moment (although LAM should be pretty straightforward to try -- you should be able to mpirun scripts with no problems; perhaps you can try it as a background effort when you have spare cycles / etc.), and look at your slave code for receiving. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Monday, January 29, 2007 7:54 PM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starter onmultiple nodes. Without analyzing your source, it's hard to say. I will say that OMPI may send fragments out of order, but we do, of course, provide the same message ordering guarantees that MPI mandates. So let me ask a few leading questions: - Are you using any wildcards in your receives, such as MPI_ANY_SOURCE or MPI_ANY_TAG? - Have you run your code through a memory-checking debugger such as valgrind? - I don't know what Scali MPI uses, but MPICH and Intel MPI use integers for MPI handles. Have you tried LAM/MPI as well? It, like Open MPI, uses pointers for MPI handles. I mention this because apps that unintentionally have code that takes advantage of integer handles can sometimes behave unpredictably when switching to a pointer-based MPI implementation. - What network interconnect are you using between the two hosts? On Jan 25, 2007, at 4:22 PM, Fisher, Mark S wrote: Recently I wanted to try OpenMPI for use with our CFD flow solver WINDUS. The code uses a master/slave methodology were the master handles I/O and issues tasks for the slaves to perform. The original parallel implementation was done in 1993 using PVM and in 1999 we added support for MPI. When testing the code with Openmpi 1.1.2 it ran fine when running on a single machine. As soon as I ran on more than one machine I started getting random errors right away (the attached tar ball has a good and bad output). It looked like either the messages were out of order or were for the other slave process. In the run mode used there is no slave to slave communication. In the file the code died near the beginning of the communication between master and slave. Sometimes it will run further before it fails. I have included a tar file with the build and configuration info. The two nodes are identical Xeon 2.8 GHZ machines running SLED 10. I am running real-time (no queue) using the ssh starter using the following appt file. -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host copland -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe The above file fails but the following works: -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host skipper2 -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe The first process is the master and the second two are the slaves. I am not sure what is going wrong, the code runs fine with ma
Re: [OMPI users] Scrambled communications using ssh starteronmultiple nodes.
The slaves send specific requests to the master and then waits for a reply to that request. For instance it might send a request to read a variable from the file. The master will read the variable and send it back with the same tag in response. Thus there is never more than one response at a time to a given slave. We do not use any broadcast functions in the code. The fact that it run ok on one host but not more than one host seems to indicate something else is the problem. The code has been used for 13 years in parallel and runs with PVM and other MPI distros without any problems. The communication patterns are very simple and only require that message order be preserved. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Tuesday, January 30, 2007 8:44 AM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starteronmultiple nodes. On Jan 30, 2007, at 9:35 AM, Fisher, Mark S wrote: > The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while > waiting for requests from slave processes. The slaves sometimes use > MPI_ANY_TAG but the source is always specified. I think you said that you only had corruption issues on the slave, right? If so, the ANY_SOURCE/ANY_TAG on the master probably aren't the issue. But if you're doing ANY_TAG on the slaves, you might want to double check that that code is doing exactly what you think it's doing. Are there any race conditions such that a message could be received on that ANY_TAG that you did not intend to receive there? Look especially hard at non-blocking receives with ANY_TAG. > We have run the code through valgrid for a number of cases including > the one being used here. Excellent. > The code is Fortran 90 and we are using the FORTRAN 77 interface so I > do not believe this is a problem. Agreed; should not be an issue. > We are using Gigabit Ethernet. Ok, good. > I could look at LAM again to see if it would work. The code needs to > be in a specific working directory and we need some environment > variable set. This was not supported well in pre MPI 2. versions of > MPI. For > MPICH1 I actually launch a script for the slaves so that we have the > proper setup before running the executable. Note I had tried that with > OpenMPI and it had an internal error in orterun. This is not a problem Really? OMPI's mpirun does not depend on the executable being an MPI application -- indeed, you can "mpirun -np 2 uptime" with no problem. What problem did you run into here? > since the mpirun can setup everything we need. If you think it is > worth while I will download and try it. From what you describe, it sounds like order of messaging may be the issue, not necessarily MPI handle types. So let's hold off on that one for the moment (although LAM should be pretty straightforward to try -- you should be able to mpirun scripts with no problems; perhaps you can try it as a background effort when you have spare cycles / etc.), and look at your slave code for receiving. > -Original Message- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Monday, January 29, 2007 7:54 PM > To: Open MPI Users > Subject: Re: [OMPI users] Scrambled communications using ssh starter > onmultiple nodes. > > Without analyzing your source, it's hard to say. I will say that OMPI > may send fragments out of order, but we do, of course, provide the > same message ordering guarantees that MPI mandates. So let me ask a > few leading questions: > > - Are you using any wildcards in your receives, such as MPI_ANY_SOURCE > or MPI_ANY_TAG? > > - Have you run your code through a memory-checking debugger such as > valgrind? > > - I don't know what Scali MPI uses, but MPICH and Intel MPI use > integers for MPI handles. Have you tried LAM/MPI as well? It, like > Open MPI, uses pointers for MPI handles. I mention this because apps > that unintentionally have code that takes advantage of integer handles > can sometimes behave unpredictably when switching to a pointer-based > MPI implementation. > > - What network interconnect are you using between the two hosts? > > > > On Jan 25, 2007, at 4:22 PM, Fisher, Mark S wrote: > >> Recently I wanted to try OpenMPI for use with our CFD flow solver >> WINDUS. The code uses a master/slave methodology were the master >> handles I/O and issues tasks for the slaves to perform. The original >> parallel implementation was done in 1993 using PVM and in 1999 we >> added support for MPI. >> >> When testing the code with Openmpi 1.1.2 it ran fine when running on >> a > >> single machine. As soon as I ran on more than one machine I started >> getting random errors right away (the attached tar ball has a good >> and > >> bad output). It looked like either the messages were out of order or >> were for the other slave process. In the run mode used there is no >> slave to slave communication. In the file the code died near the >> beginning of the communication
Re: [OMPI users] Scrambled communications using ssh starteronmultiple nodes.
Is there any way that you can share the code? On Jan 30, 2007, at 9:57 AM, Fisher, Mark S wrote: The slaves send specific requests to the master and then waits for a reply to that request. For instance it might send a request to read a variable from the file. The master will read the variable and send it back with the same tag in response. Thus there is never more than one response at a time to a given slave. We do not use any broadcast functions in the code. The fact that it run ok on one host but not more than one host seems to indicate something else is the problem. The code has been used for 13 years in parallel and runs with PVM and other MPI distros without any problems. The communication patterns are very simple and only require that message order be preserved. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Tuesday, January 30, 2007 8:44 AM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starteronmultiple nodes. On Jan 30, 2007, at 9:35 AM, Fisher, Mark S wrote: The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while waiting for requests from slave processes. The slaves sometimes use MPI_ANY_TAG but the source is always specified. I think you said that you only had corruption issues on the slave, right? If so, the ANY_SOURCE/ANY_TAG on the master probably aren't the issue. But if you're doing ANY_TAG on the slaves, you might want to double check that that code is doing exactly what you think it's doing. Are there any race conditions such that a message could be received on that ANY_TAG that you did not intend to receive there? Look especially hard at non-blocking receives with ANY_TAG. We have run the code through valgrid for a number of cases including the one being used here. Excellent. The code is Fortran 90 and we are using the FORTRAN 77 interface so I do not believe this is a problem. Agreed; should not be an issue. We are using Gigabit Ethernet. Ok, good. I could look at LAM again to see if it would work. The code needs to be in a specific working directory and we need some environment variable set. This was not supported well in pre MPI 2. versions of MPI. For MPICH1 I actually launch a script for the slaves so that we have the proper setup before running the executable. Note I had tried that with OpenMPI and it had an internal error in orterun. This is not a problem Really? OMPI's mpirun does not depend on the executable being an MPI application -- indeed, you can "mpirun -np 2 uptime" with no problem. What problem did you run into here? since the mpirun can setup everything we need. If you think it is worth while I will download and try it. From what you describe, it sounds like order of messaging may be the issue, not necessarily MPI handle types. So let's hold off on that one for the moment (although LAM should be pretty straightforward to try -- you should be able to mpirun scripts with no problems; perhaps you can try it as a background effort when you have spare cycles / etc.), and look at your slave code for receiving. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Monday, January 29, 2007 7:54 PM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starter onmultiple nodes. Without analyzing your source, it's hard to say. I will say that OMPI may send fragments out of order, but we do, of course, provide the same message ordering guarantees that MPI mandates. So let me ask a few leading questions: - Are you using any wildcards in your receives, such as MPI_ANY_SOURCE or MPI_ANY_TAG? - Have you run your code through a memory-checking debugger such as valgrind? - I don't know what Scali MPI uses, but MPICH and Intel MPI use integers for MPI handles. Have you tried LAM/MPI as well? It, like Open MPI, uses pointers for MPI handles. I mention this because apps that unintentionally have code that takes advantage of integer handles can sometimes behave unpredictably when switching to a pointer-based MPI implementation. - What network interconnect are you using between the two hosts? On Jan 25, 2007, at 4:22 PM, Fisher, Mark S wrote: Recently I wanted to try OpenMPI for use with our CFD flow solver WINDUS. The code uses a master/slave methodology were the master handles I/O and issues tasks for the slaves to perform. The original parallel implementation was done in 1993 using PVM and in 1999 we added support for MPI. When testing the code with Openmpi 1.1.2 it ran fine when running on a single machine. As soon as I ran on more than one machine I started getting random errors right away (the attached tar ball has a good and bad output). It looked like either the messages were out of order or were for the other slave process. In the run mode used there is no slave to slave communication. In the file the code died near the beginning of the communication between m
Re: [OMPI users] Scrambled communications using sshstarteronmultiple nodes.
The code can be Freely downloaded for US citizens (it is export controlled) at http://zephyr.lerc.nasa.gov/wind/. I can also provide you the test case which is very small. I am a developer of the code and can help you dig through it if you decide to download it. On the above page you will need to request the code, if you request it just mention my name to help expedite the approval. -Original Message- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Tuesday, January 30, 2007 9:09 AM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using sshstarteronmultiple nodes. Is there any way that you can share the code? On Jan 30, 2007, at 9:57 AM, Fisher, Mark S wrote: > The slaves send specific requests to the master and then waits for a > reply to that request. For instance it might send a request to read a > variable from the file. The master will read the variable and send it > back with the same tag in response. Thus there is never more than one > response at a time to a given slave. We do not use any broadcast > functions in the code. > > The fact that it run ok on one host but not more than one host seems > to indicate something else is the problem. The code has been used for > 13 years in parallel and runs with PVM and other MPI distros without > any problems. The communication patterns are very simple and only > require that message order be preserved. > > -Original Message- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Tuesday, January 30, 2007 8:44 AM > To: Open MPI Users > Subject: Re: [OMPI users] Scrambled communications using ssh > starteronmultiple nodes. > > On Jan 30, 2007, at 9:35 AM, Fisher, Mark S wrote: > >> The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while >> waiting for requests from slave processes. The slaves sometimes use >> MPI_ANY_TAG but the source is always specified. > > I think you said that you only had corruption issues on the slave, > right? If so, the ANY_SOURCE/ANY_TAG on the master probably aren't > the issue. > > But if you're doing ANY_TAG on the slaves, you might want to double > check that that code is doing exactly what you think it's doing. Are > there any race conditions such that a message could be received on > that ANY_TAG that you did not intend to receive there? Look > especially hard at non-blocking receives with ANY_TAG. > >> We have run the code through valgrid for a number of cases including >> the one being used here. > > Excellent. > >> The code is Fortran 90 and we are using the FORTRAN 77 interface so I >> do not believe this is a problem. > > Agreed; should not be an issue. > >> We are using Gigabit Ethernet. > > Ok, good. > >> I could look at LAM again to see if it would work. The code needs to >> be in a specific working directory and we need some environment >> variable set. This was not supported well in pre MPI 2. versions of >> MPI. For >> MPICH1 I actually launch a script for the slaves so that we have the >> proper setup before running the executable. Note I had tried that >> with > >> OpenMPI and it had an internal error in orterun. This is not a >> problem > > Really? OMPI's mpirun does not depend on the executable being an MPI > application -- indeed, you can "mpirun -np 2 uptime" with no problem. > What problem did you run into here? > >> since the mpirun can setup everything we need. If you think it is >> worth while I will download and try it. > > From what you describe, it sounds like order of messaging may be the > issue, not necessarily MPI handle types. So let's hold off on that > one for the moment (although LAM should be pretty straightforward to > try -- you should be able to mpirun scripts with no problems; perhaps > you can try it as a background effort when you have spare cycles / > etc.), and look at your slave code for receiving. > > >> -Original Message- >> From: Jeff Squyres [mailto:jsquy...@cisco.com] >> Sent: Monday, January 29, 2007 7:54 PM >> To: Open MPI Users >> Subject: Re: [OMPI users] Scrambled communications using ssh starter >> onmultiple nodes. >> >> Without analyzing your source, it's hard to say. I will say that >> OMPI > >> may send fragments out of order, but we do, of course, provide the >> same message ordering guarantees that MPI mandates. So let me ask a >> few leading questions: >> >> - Are you using any wildcards in your receives, such as >> MPI_ANY_SOURCE > >> or MPI_ANY_TAG? >> >> - Have you run your code through a memory-checking debugger such as >> valgrind? >> >> - I don't know what Scali MPI uses, but MPICH and Intel MPI use >> integers for MPI handles. Have you tried LAM/MPI as well? It, like >> Open MPI, uses pointers for MPI handles. I mention this because apps >> that unintentionally have code that takes advantage of integer >> handles > >> can sometimes behave unpredictably when switching to a pointer-based >> MPI implementation. >> >> - What network interconnect
Re: [OMPI users] mutex deadlock in btl tcp
Jeremy, You're right. Thanks for point it out. I do the change in the trunk. george. On Jan 30, 2007, at 3:40 AM, Jeremy Buisson wrote: Dear Open MPI users list, From time to time, I experience a mutex deadlock in Open-MPI 1.1.2. The stack trace is available at the end of the mail. The deadlock seems to be caused by lines 118 & 119 of the ompi/mca/btl/tcp/btl_tcp.c file, in function mca_btl_tcp_add_procs: OBJ_RELEASE(tcp_endpoint); OPAL_THREAD_UNLOCK(&tcp_proc->proc_lock); (of course, I did not check whether line numbers have changed since 1.1.2.) Indeed, releasing tcp_endpoint causes a call to mca_btl_tcp_proc_remove that attempts to acquire the mutex tcp_proc->proc_lock, which is already held by the thread (OBJ_THREAD_LOCK(&tcp_proc->proc_lock) at line 103 of the ompi/mca/btl/tcp/btl_tcp.c file). Switching the two lines above (ie releasing the mutex before destructing tcp_endpoint) seems to be sufficient to fix the deadlock. Maybe should the changes done in the mca_btl_tcp_proc_insert function be reverted rather than releasing the mutex before tcp_endpoint? As far as I looked, the problem seems to still appear in the trunk revision 13359. Second point. Is there any reason why MPI_Comm_spawn is restricted to execute the new process(es) only on hosts listed in either the --host option or in the hostfile? Or did I miss something? Best regards, Jeremy -- stack trace as dumped by open-mpi (gdb version follows): opal_mutex_lock(): Resource deadlock avoided Signal:6 info.si_errno:0(Success) si_code:-6() [0] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libopal.so.0 [0x8addeb] [1] func:/lib/tls/libpthread.so.0 [0x176e40] [2] func:/lib/tls/libc.so.6(abort+0x1d5) [0xa294e5] [3] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65f8a3] [4] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so (mca_btl_tcp_proc_remove+0x2a) [0x65fff0] [5] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x65cb24] [6] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so [0x659465] [7] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_btl_tcp.so (mca_btl_tcp_add_procs+0x10f) [0x65927b] [8] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_bml_r2.so (mca_bml_r2_add_procs+0x1bb) [0x628023] [9] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/openmpi/mca_pml_ob1.so (mca_pml_ob1_add_procs+0xd6) [0x61650b] [10] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0 (ompi_comm_get_rport+0x1f8) [0xb82303] [11] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0 (ompi_comm_connect_accept+0xbb) [0xb81b43] [12] func:/home1/jbuisson/soft/openmpi-1.1.2/lib/libmpi.so.0 (PMPI_Comm_spawn+0x3de) [0xbb671a] [13] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0 +0x3d2) [0x804bb8e] [14] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bdff] [15] func:/home1/jbuisson/target/bin/mpi-spawner [0x804bfd4] [16] func:/lib/tls/libc.so.6(__libc_start_main+0xda) [0xa1578a] [17] func:/home1/jbuisson/target/bin/mpi-spawner(__gxx_personality_v0+0x75) [0x804b831] *** End of error message *** Same stack, dumped by gdb: #0 0x00176357 in __pause_nocancel () from /lib/tls/libpthread.so.0 #1 0x008ade9b in opal_show_stackframe (signo=6, info=0xbfff9290, p=0xbfff9310) at stacktrace.c:306 #2 #3 0x00a27cdf in raise () from /lib/tls/libc.so.6 #4 0x00a294e5 in abort () from /lib/tls/libc.so.6 #5 0x0065f8a3 in opal_mutex_lock (m=0x8ff8250) at ../../../../opal/threads/mutex_unix.h:104 #6 0x0065fff0 in mca_btl_tcp_proc_remove (btl_proc=0x8ff8220, btl_endpoint=0x900eba0) at btl_tcp_proc.c:296 #7 0x0065cb24 in mca_btl_tcp_endpoint_destruct (endpoint=0x900eba0) at btl_tcp_endpoint.c:99 #8 0x00659465 in opal_obj_run_destructors (object=0x900eba0) at ../../../../opal/class/opal_object.h:405 #9 0x0065927b in mca_btl_tcp_add_procs (btl=0x8e57c30, nprocs=1, ompi_procs=0x8ff7ac8, peers=0x8ff7ad8, reachable=0xbfff98e4) at btl_tcp.c:118 #10 0x00628023 in mca_bml_r2_add_procs (nprocs=1, procs=0x8ff7ac8, bml_endpoints=0x8ff60b8, reachable=0xbfff98e4) at bml_r2.c:231 #11 0x0061650b in mca_pml_ob1_add_procs (procs=0xbfff9930, nprocs=1) at pml_ob1.c:133 #12 0x00b82303 in ompi_comm_get_rport (port=0x0, send_first=0, proc=0x8e51c70, tag=2000) at communicator/comm_dyn.c:305 #13 0x00b81b43 in ompi_comm_connect_accept (comm=0x8ff8ce0, root=0, port=0x0, send_first=0, newcomm=0xbfff9a38, tag=2000) at communicator/comm_dyn.c:85 #14 0x00bb671a in PMPI_Comm_spawn (command=0x8ff88f0 "/home1/jbuisson/target/bin/sample-npb-ft-pp", argv=0xbfff9b40, maxprocs=1, info=0x8ff73e0, root=0, comm=0x8ff8ce0, intercomm=0xbfff9aa4, array_of_errcodes=0x0) at pcomm_spawn.c:110 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] no MPI_2COMPLEX and MPI_2DOUBLE_COMPLEX
Hello, I see the extern definitions in mpi.h for ompi_mpi_2cplex and ompi_mpi_2dblcplex, but no #define for MPI_2COMPLEX and MPI_2DOUBLE_COMPLEX. Greetings Bert Wesarg
Re: [OMPI users] ompi_info segmentation fault
Jeff, Upgrading to 1.1.3 solved both issues - thank you very much! Avishay On Mon, 2007-01-29 at 20:59 -0500, Jeff Squyres wrote: > I'm quite sure that we have since fixed the command line parsing > problem, and I *think* we fixed the mmap problem. > > Is there any way that you can upgrade to v1.1.3? > > > On Jan 29, 2007, at 3:24 PM, Avishay Traeger wrote: > > > Hello, > > > > I have just installed Open MPI 1.1 on a 64-bit FC6 machine using yum. > > The packages that were installed are: > > openmpi-devel-1.1-7.fc6 > > openmpi-libs-1.1-7.fc6 > > openmpi-1.1-7.fc6 > > > > I tried running ompi_info, but it results in a segmentation fault. > > Running strace shows this at the end: > > > > mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE| > > MAP_ANONYMOUS, > > -1, 0) = -1 ENOMEM (Cannot allocate memory) > > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > > +++ killed by SIGSEGV +++ > > > > The full output of ompi_info is: > > # ompi_info > > Open MPI: 1.1 > >Open MPI SVN revision: r10477 > > Open RTE: 1.1 > >Open RTE SVN revision: r10477 > > OPAL: 1.1 > >OPAL SVN revision: r10477 > > Prefix: /usr > > Configured architecture: x86_64-redhat-linux-gnu > >Configured by: brewbuilder > >Configured on: Fri Oct 13 14:34:07 EDT 2006 > > Configure host: hs20-bc1-7.build.redhat.com > > Built by: brewbuilder > > Built on: Fri Oct 13 14:44:39 EDT 2006 > > Built host: hs20-bc1-7.build.redhat.com > > C bindings: yes > > C++ bindings: yes > > Fortran77 bindings: yes (single underscore) > > Fortran90 bindings: yes > > Fortran90 bindings size: small > > C compiler: gcc > > C compiler absolute: /usr/bin/gcc > > C++ compiler: g++ > >C++ compiler absolute: /usr/bin/g++ > > Fortran77 compiler: gfortran > > Fortran77 compiler abs: /usr/bin/gfortran > > Fortran90 compiler: gfortran > > Fortran90 compiler abs: /usr/bin/gfortran > > C profiling: yes > >C++ profiling: yes > > Fortran77 profiling: yes > > Fortran90 profiling: yes > > C++ exceptions: no > > Thread support: posix (mpi: no, progress: no) > > Internal debug support: no > > MPI parameter check: runtime > > Memory profiling support: no > > Memory debugging support: no > > libltdl support: yes > > Segmentation fault > > > > It seems that at this point in the program, it tries to map 4GB of > > memory, which results in ENOMEM. I'm guessing that the return > > value of > > mmap isn't checked, which results in this segmentation fault. > > > > Also, I tried running "mpirun", and the output was: > > # mpirun > > *** buffer overflow detected ***: mpirun terminated > > === Backtrace: = > > /lib64/libc.so.6(__chk_fail+0x2f)[0x3f59ce0dff] > > /lib64/libc.so.6[0x3f59ce065b] > > /lib64/libc.so.6(__snprintf_chk+0x7b)[0x3f59ce052b] > > /usr/lib64/openmpi/libopal.so.0(opal_cmd_line_get_usage_msg > > +0x20a)[0x304901963a] > > mpirun[0x403c7c] > > mpirun(orterun+0xa4)[0x40260c] > > mpirun(main+0x1b)[0x402563] > > /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f59c1da44] > > mpirun[0x4024b9] > > > > It also included a "Memory map", which I left out. > > > > Any suggestions? > > > > Thanks in advance, > > Avishay > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > >
Re: [OMPI users] ompi_info segmentation fault
Please note that due to a mixup in the 1.1.3 release, we just released v1.1.4. :-( See http://www.open-mpi.org/community/lists/announce/2007/01/0010.php for the official announcement. The short version is that the wrong tarball was posted to the OMPI web site for the 1.1.3 release (doh!). So we released today as v1.1.4 what should have been released a few days ago as v1.1.3. On Jan 30, 2007, at 2:03 PM, Avishay Traeger wrote: Jeff, Upgrading to 1.1.3 solved both issues - thank you very much! Avishay On Mon, 2007-01-29 at 20:59 -0500, Jeff Squyres wrote: I'm quite sure that we have since fixed the command line parsing problem, and I *think* we fixed the mmap problem. Is there any way that you can upgrade to v1.1.3? On Jan 29, 2007, at 3:24 PM, Avishay Traeger wrote: Hello, I have just installed Open MPI 1.1 on a 64-bit FC6 machine using yum. The packages that were installed are: openmpi-devel-1.1-7.fc6 openmpi-libs-1.1-7.fc6 openmpi-1.1-7.fc6 I tried running ompi_info, but it results in a segmentation fault. Running strace shows this at the end: mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE| MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ The full output of ompi_info is: # ompi_info Open MPI: 1.1 Open MPI SVN revision: r10477 Open RTE: 1.1 Open RTE SVN revision: r10477 OPAL: 1.1 OPAL SVN revision: r10477 Prefix: /usr Configured architecture: x86_64-redhat-linux-gnu Configured by: brewbuilder Configured on: Fri Oct 13 14:34:07 EDT 2006 Configure host: hs20-bc1-7.build.redhat.com Built by: brewbuilder Built on: Fri Oct 13 14:44:39 EDT 2006 Built host: hs20-bc1-7.build.redhat.com C bindings: yes C++ bindings: yes Fortran77 bindings: yes (single underscore) Fortran90 bindings: yes Fortran90 bindings size: small C compiler: gcc C compiler absolute: /usr/bin/gcc C++ compiler: g++ C++ compiler absolute: /usr/bin/g++ Fortran77 compiler: gfortran Fortran77 compiler abs: /usr/bin/gfortran Fortran90 compiler: gfortran Fortran90 compiler abs: /usr/bin/gfortran C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: no Thread support: posix (mpi: no, progress: no) Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Segmentation fault It seems that at this point in the program, it tries to map 4GB of memory, which results in ENOMEM. I'm guessing that the return value of mmap isn't checked, which results in this segmentation fault. Also, I tried running "mpirun", and the output was: # mpirun *** buffer overflow detected ***: mpirun terminated === Backtrace: = /lib64/libc.so.6(__chk_fail+0x2f)[0x3f59ce0dff] /lib64/libc.so.6[0x3f59ce065b] /lib64/libc.so.6(__snprintf_chk+0x7b)[0x3f59ce052b] /usr/lib64/openmpi/libopal.so.0(opal_cmd_line_get_usage_msg +0x20a)[0x304901963a] mpirun[0x403c7c] mpirun(orterun+0xa4)[0x40260c] mpirun(main+0x1b)[0x402563] /lib64/libc.so.6(__libc_start_main+0xf4)[0x3f59c1da44] mpirun[0x4024b9] It also included a "Memory map", which I left out. Any suggestions? Thanks in advance, Avishay ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems
[OMPI users] MPI_Type_create_subarray fails!
Hi, Recently I installed OpenMPI 1.1.4 using the source RPM on Fedora Core 6. then I tried to run some benchmarks from NASA. First I tried is some I/O benchmarks, It compiles, but when I run it, it generates the following error: [abc:25584] *** An error occurred in MPI_Type_create_subarray [abc:25583] *** on communicator MPI_COMM_WORLD [abc:25583] *** MPI_ERR_TYPE: invalid datatype [abc:25583] *** MPI_ERRORS_ARE_FATAL (goodbye) Then I find all the references to the MPI_Type_create_subarray and create a little program just to test that part of the code, the code I created is: #include "mpi.h" int main(int argc, char *argv[]) { MPI_Datatype subarray, type; int array_size[] = {9}; int array_subsize[] = {3}; int array_start[] = {1}; int i, err; MPI_Init(&argc, &argv); /* Create a new type */ MPI_Type_contiguous(5, MPI_INT, &type); MPI_Type_commit(&type); /* Create a subarray datatype */ MPI_Type_create_subarray(1, array_size, array_subsize, array_start, MPI_ORDER_C, type, &subarray); MPI_Type_commit(&subarray); MPI_Finalize(); return 0; } After running this little program using mpirun, it raises the same error. Thanks in advance, Ivan