Hi Randolph Unless your code is doing a connect/accept between the copies, there is no way they can cross-communicate. As you note, mpirun instances are completely isolated from each other - no process in one instance can possibly receive information from a process in another instance because it lacks all knowledge of it -unless- they wireup into a greater communicator by performing connect/accept calls between them.
I suspect you are inadvertently doing just that - perhaps by doing connect/accept in a tree-like manner, not realizing that the end result is one giant communicator that now links together all the N servers. Otherwise, there is no possible way an MPI_Bcast in one mpirun can collide or otherwise communicate with an MPI_Bcast between processes started by another mpirun. On Aug 8, 2010, at 7:13 PM, Randolph Pullen wrote: > Thanks, although “An intercommunicator cannot be used for collective > communication.” i.e , bcast calls., I can see how the MPI_Group_xx calls can > be used to produce a useful group and then communicator; - thanks again but > this is really the side issue to my main question about MPI_Bcast. > > I seem to have duplicate concurrent processes interfering with each other. > This would appear to be a breach of the MPI safety dictum, ie MPI_COMM_WORD > is supposed to only include the processes started by a single mpirun command > and isolate these processes from other similar groups of processes safely. > > So, it would appear to be a bug. If so this has significant implications for > environments such as mine, where it may often occur that the same program is > run by different users simultaneously. > > It is really this issue that it concerning me, I can rewrite the code but if > it can crash when 2 copies run at the same time, I have a much bigger problem. > > My suspicion is that a within the MPI_Bcast handshaking, a syncronising > broadcast call may be colliding across the environments. My only evidence is > an otherwise working program waits on broadcast reception forever when two or > more copies are run at [exactly] the same time. > > Has anyone else seen similar behavior in concurrently running programs that > perform lots of broadcasts perhaps? > > Randolph > > > --- On Sun, 8/8/10, David Zhang <solarbik...@gmail.com> wrote: > > From: David Zhang <solarbik...@gmail.com> > Subject: Re: [OMPI users] MPI_Bcast issue > To: "Open MPI Users" <us...@open-mpi.org> > Received: Sunday, 8 August, 2010, 12:34 PM > > In particular, intercommunicators > > On 8/7/10, Aurélien Bouteiller <boute...@eecs.utk.edu> wrote: > > You should consider reading about communicators in MPI. > > > > Aurelien > > -- > > Aurelien Bouteiller, Ph.D. > > Innovative Computing Laboratory, The University of Tennessee. > > > > Envoyé de mon iPad > > > > Le Aug 7, 2010 à 1:05, Randolph Pullen <randolph_pul...@yahoo.com.au> a > > écrit : > > > >> I seem to be having a problem with MPI_Bcast. > >> My massive I/O intensive data movement program must broadcast from n to n > >> nodes. My problem starts because I require 2 processes per node, a sender > >> and a receiver and I have implemented these using MPI processes rather > >> than tackle the complexities of threads on MPI. > >> > >> Consequently, broadcast and calls like alltoall are not completely > >> helpful. The dataset is huge and each node must end up with a complete > >> copy built by the large number of contributing broadcasts from the sending > >> nodes. Network efficiency and run time are paramount. > >> > >> As I don’t want to needlessly broadcast all this data to the sending nodes > >> and I have a perfectly good MPI program that distributes globally from a > >> single node (1 to N), I took the unusual decision to start N copies of > >> this program by spawning the MPI system from the PVM system in an effort > >> to get my N to N concurrent transfers. > >> > >> It seems that the broadcasts running on concurrent MPI environments > >> collide and cause all but the first process to hang waiting for their > >> broadcasts. This theory seems to be confirmed by introducing a sleep of > >> n-1 seconds before the first MPI_Bcast call on each node, which results > >> in the code working perfectly. (total run time 55 seconds, 3 nodes, > >> standard TCP stack) > >> > >> My guess is that unlike PVM, OpenMPI implements broadcasts with broadcasts > >> rather than multicasts. Can someone confirm this? Is this a bug? > >> > >> Is there any multicast or N to N broadcast where sender processes can > >> avoid participating when they don’t need to? > >> > >> Thanks in advance > >> Randolph > >> > >> > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Sent from my mobile device > > David Zhang > University of California, San Diego > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users