On 8/8/2010 8:13 PM, Randolph Pullen wrote:
> Thanks,  although “An intercommunicator cannot be used for collective
> communication.” i.e ,  bcast calls., 

yes it can. MPI-1 did not allow for collective operations on
intercommunicators, but the MPI-2 specification did introduce that notion.

Thanks
Edgar

> I can see how the MPI_Group_xx
> calls can be used to produce a useful group and then communicator;  -
> thanks again but this is really the side issue to my main question
> about MPI_Bcast.
> 
> I seem to have duplicate concurrent processes interfering with each
> other.  This would appear to be a breach of the MPI safety dictum, ie
> MPI_COMM_WORD is supposed to only include the processes started by a
> single mpirun command and isolate these processes from other similar
> groups of processes safely.
> 
> So, it would appear to be a bug.  If so this has significant
> implications for environments such as mine, where it may often occur
> that the same program is run by different users simultaneously.
> 
> It is really this issue that it concerning me, I can rewrite the code
> but if it can crash when 2 copies run at the same time, I have a much
> bigger problem.
> 
> My suspicion is that a within the MPI_Bcast handshaking, a
> syncronising broadcast call may be colliding across the environments.
> My only evidence is an otherwise working program waits on broadcast
> reception forever when two or more copies are run at [exactly] the
> same time.
> 
> Has anyone else seen similar behavior in concurrently running
> programs that perform lots of broadcasts perhaps?
> 
> Randolph
> 
> 
> --- On Sun, 8/8/10, David Zhang <solarbik...@gmail.com> wrote:
> 
> From: David Zhang <solarbik...@gmail.com> Subject: Re: [OMPI users]
> MPI_Bcast issue To: "Open MPI Users" <us...@open-mpi.org> Received:
> Sunday, 8 August, 2010, 12:34 PM
> 
> In particular, intercommunicators
> 
> On 8/7/10, Aurélien Bouteiller <boute...@eecs.utk.edu> wrote:
>> You should consider reading about communicators in MPI.
>> 
>> Aurelien -- Aurelien Bouteiller, Ph.D. Innovative Computing
>> Laboratory, The University of Tennessee.
>> 
>> Envoyé de mon iPad
>> 
>> Le Aug 7, 2010 à 1:05, Randolph Pullen
>> <randolph_pul...@yahoo.com.au> a écrit :
>> 
>>> I seem to be having a problem with MPI_Bcast. My massive I/O
>>> intensive data movement program must broadcast from n to n nodes.
>>> My problem starts because I require 2 processes per node, a
>>> sender and a receiver and I have implemented these using MPI
>>> processes rather than tackle the complexities of threads on MPI.
>>> 
>>> Consequently, broadcast and calls like alltoall are not
>>> completely helpful.  The dataset is huge and each node must end
>>> up with a complete copy built by the large number of contributing
>>> broadcasts from the sending nodes.  Network efficiency and run
>>> time are paramount.
>>> 
>>> As I don’t want to needlessly broadcast all this data to the
>>> sending nodes and I have a perfectly good MPI program that
>>> distributes globally from a single node (1 to N), I took the
>>> unusual decision to start N copies of this program by spawning
>>> the MPI system from the PVM system in an effort to get my N to N
>>> concurrent transfers.
>>> 
>>> It seems that the broadcasts running on concurrent MPI
>>> environments collide and cause all but the first process to hang
>>> waiting for their broadcasts.  This theory seems to be confirmed
>>> by introducing a sleep of n-1 seconds before the first MPI_Bcast
>>> call on each node, which results in the code working perfectly.
>>> (total run time 55 seconds, 3 nodes, standard TCP stack)
>>> 
>>> My guess is that unlike PVM, OpenMPI implements broadcasts with
>>> broadcasts rather than multicasts.  Can someone confirm this?  Is
>>> this a bug?
>>> 
>>> Is there any multicast or N to N broadcast where sender processes
>>> can avoid participating when they don’t need to?
>>> 
>>> Thanks in advance Randolph
>>> 
>>> 
>>> 
>>> _______________________________________________ users mailing
>>> list us...@open-mpi.org 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> 
> _______________________________________________ users mailing list 
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to