The slaves send specific requests to the master and then waits for a reply to that request. For instance it might send a request to read a variable from the file. The master will read the variable and send it back with the same tag in response. Thus there is never more than one response at a time to a given slave. We do not use any broadcast functions in the code.
The fact that it run ok on one host but not more than one host seems to indicate something else is the problem. The code has been used for 13 years in parallel and runs with PVM and other MPI distros without any problems. The communication patterns are very simple and only require that message order be preserved. -----Original Message----- From: Jeff Squyres [mailto:jsquy...@cisco.com] Sent: Tuesday, January 30, 2007 8:44 AM To: Open MPI Users Subject: Re: [OMPI users] Scrambled communications using ssh starteronmultiple nodes. On Jan 30, 2007, at 9:35 AM, Fisher, Mark S wrote: > The master process uses both MPI_ANY_SOURCE and MPI_ANY_TAG while > waiting for requests from slave processes. The slaves sometimes use > MPI_ANY_TAG but the source is always specified. I think you said that you only had corruption issues on the slave, right? If so, the ANY_SOURCE/ANY_TAG on the master probably aren't the issue. But if you're doing ANY_TAG on the slaves, you might want to double check that that code is doing exactly what you think it's doing. Are there any race conditions such that a message could be received on that ANY_TAG that you did not intend to receive there? Look especially hard at non-blocking receives with ANY_TAG. > We have run the code through valgrid for a number of cases including > the one being used here. Excellent. > The code is Fortran 90 and we are using the FORTRAN 77 interface so I > do not believe this is a problem. Agreed; should not be an issue. > We are using Gigabit Ethernet. Ok, good. > I could look at LAM again to see if it would work. The code needs to > be in a specific working directory and we need some environment > variable set. This was not supported well in pre MPI 2. versions of > MPI. For > MPICH1 I actually launch a script for the slaves so that we have the > proper setup before running the executable. Note I had tried that with > OpenMPI and it had an internal error in orterun. This is not a problem Really? OMPI's mpirun does not depend on the executable being an MPI application -- indeed, you can "mpirun -np 2 uptime" with no problem. What problem did you run into here? > since the mpirun can setup everything we need. If you think it is > worth while I will download and try it. From what you describe, it sounds like order of messaging may be the issue, not necessarily MPI handle types. So let's hold off on that one for the moment (although LAM should be pretty straightforward to try -- you should be able to mpirun scripts with no problems; perhaps you can try it as a background effort when you have spare cycles / etc.), and look at your slave code for receiving. > -----Original Message----- > From: Jeff Squyres [mailto:jsquy...@cisco.com] > Sent: Monday, January 29, 2007 7:54 PM > To: Open MPI Users > Subject: Re: [OMPI users] Scrambled communications using ssh starter > onmultiple nodes. > > Without analyzing your source, it's hard to say. I will say that OMPI > may send fragments out of order, but we do, of course, provide the > same message ordering guarantees that MPI mandates. So let me ask a > few leading questions: > > - Are you using any wildcards in your receives, such as MPI_ANY_SOURCE > or MPI_ANY_TAG? > > - Have you run your code through a memory-checking debugger such as > valgrind? > > - I don't know what Scali MPI uses, but MPICH and Intel MPI use > integers for MPI handles. Have you tried LAM/MPI as well? It, like > Open MPI, uses pointers for MPI handles. I mention this because apps > that unintentionally have code that takes advantage of integer handles > can sometimes behave unpredictably when switching to a pointer-based > MPI implementation. > > - What network interconnect are you using between the two hosts? > > > > On Jan 25, 2007, at 4:22 PM, Fisher, Mark S wrote: > >> Recently I wanted to try OpenMPI for use with our CFD flow solver >> WINDUS. The code uses a master/slave methodology were the master >> handles I/O and issues tasks for the slaves to perform. The original >> parallel implementation was done in 1993 using PVM and in 1999 we >> added support for MPI. >> >> When testing the code with Openmpi 1.1.2 it ran fine when running on >> a > >> single machine. As soon as I ran on more than one machine I started >> getting random errors right away (the attached tar ball has a good >> and > >> bad output). It looked like either the messages were out of order or >> were for the other slave process. In the run mode used there is no >> slave to slave communication. In the file the code died near the >> beginning of the communication between master and slave. Sometimes it >> will run further before it fails. >> >> I have included a tar file with the build and configuration info. The >> two nodes are identical Xeon 2.8 GHZ machines running SLED 10. I am >> running real-time (no queue) using the ssh starter using the >> following > >> appt file. >> >> -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host >> skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ >> __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent >> /usr/bin/ssh --host copland -wdir /tmp/mpi.m209290 -np 2 >> ./__bcfdbeta.exe >> >> The above file fails but the following works: >> >> -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent /usr/bin/ssh --host >> skipper2 -wdir /opt/scratch/m209290/ol.scr.16348 -np 1 ./ >> __bcfdbeta.exe -x PVMTASK -x BCFD_PS_MODE --mca pls_rsh_agent >> /usr/bin/ssh --host >> skipper2 -wdir /tmp/mpi.m209290 -np 2 ./__bcfdbeta.exe >> >> The first process is the master and the second two are the slaves. >> I am >> not sure what is going wrong, the code runs fine with many other MPI >> distributions (MPICH1/2, Intel, Scali...). I assume that either I >> built it wrong or am not running it properly but I cannot see what I >> am doing wrong. Any help would be appreciated! >> >> <<mpipb.tgz>> >> <mpipb.tgz> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > Server Virtualization Business Unit > Cisco Systems > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Server Virtualization Business Unit Cisco Systems _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users