On 8/8/2010 8:13 PM, Randolph Pullen wrote: > Thanks, although “An intercommunicator cannot be used for collective > communication.” i.e , bcast calls.,
yes it can. MPI-1 did not allow for collective operations on intercommunicators, but the MPI-2 specification did introduce that notion. Thanks Edgar > I can see how the MPI_Group_xx > calls can be used to produce a useful group and then communicator; - > thanks again but this is really the side issue to my main question > about MPI_Bcast. > > I seem to have duplicate concurrent processes interfering with each > other. This would appear to be a breach of the MPI safety dictum, ie > MPI_COMM_WORD is supposed to only include the processes started by a > single mpirun command and isolate these processes from other similar > groups of processes safely. > > So, it would appear to be a bug. If so this has significant > implications for environments such as mine, where it may often occur > that the same program is run by different users simultaneously. > > It is really this issue that it concerning me, I can rewrite the code > but if it can crash when 2 copies run at the same time, I have a much > bigger problem. > > My suspicion is that a within the MPI_Bcast handshaking, a > syncronising broadcast call may be colliding across the environments. > My only evidence is an otherwise working program waits on broadcast > reception forever when two or more copies are run at [exactly] the > same time. > > Has anyone else seen similar behavior in concurrently running > programs that perform lots of broadcasts perhaps? > > Randolph > > > --- On Sun, 8/8/10, David Zhang <solarbik...@gmail.com> wrote: > > From: David Zhang <solarbik...@gmail.com> Subject: Re: [OMPI users] > MPI_Bcast issue To: "Open MPI Users" <us...@open-mpi.org> Received: > Sunday, 8 August, 2010, 12:34 PM > > In particular, intercommunicators > > On 8/7/10, Aurélien Bouteiller <boute...@eecs.utk.edu> wrote: >> You should consider reading about communicators in MPI. >> >> Aurelien -- Aurelien Bouteiller, Ph.D. Innovative Computing >> Laboratory, The University of Tennessee. >> >> Envoyé de mon iPad >> >> Le Aug 7, 2010 à 1:05, Randolph Pullen >> <randolph_pul...@yahoo.com.au> a écrit : >> >>> I seem to be having a problem with MPI_Bcast. My massive I/O >>> intensive data movement program must broadcast from n to n nodes. >>> My problem starts because I require 2 processes per node, a >>> sender and a receiver and I have implemented these using MPI >>> processes rather than tackle the complexities of threads on MPI. >>> >>> Consequently, broadcast and calls like alltoall are not >>> completely helpful. The dataset is huge and each node must end >>> up with a complete copy built by the large number of contributing >>> broadcasts from the sending nodes. Network efficiency and run >>> time are paramount. >>> >>> As I don’t want to needlessly broadcast all this data to the >>> sending nodes and I have a perfectly good MPI program that >>> distributes globally from a single node (1 to N), I took the >>> unusual decision to start N copies of this program by spawning >>> the MPI system from the PVM system in an effort to get my N to N >>> concurrent transfers. >>> >>> It seems that the broadcasts running on concurrent MPI >>> environments collide and cause all but the first process to hang >>> waiting for their broadcasts. This theory seems to be confirmed >>> by introducing a sleep of n-1 seconds before the first MPI_Bcast >>> call on each node, which results in the code working perfectly. >>> (total run time 55 seconds, 3 nodes, standard TCP stack) >>> >>> My guess is that unlike PVM, OpenMPI implements broadcasts with >>> broadcasts rather than multicasts. Can someone confirm this? Is >>> this a bug? >>> >>> Is there any multicast or N to N broadcast where sender processes >>> can avoid participating when they don’t need to? >>> >>> Thanks in advance Randolph >>> >>> >>> >>> _______________________________________________ users mailing >>> list us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > _______________________________________________ users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users