Thanks,
The messages are small and frequent (they flash metadata across the cluster).  
The current approach works fine for small to medium clusters but I want it to 
be able to go big.  Maybe up to several hundred or even a thousands of nodes.

Its these larger deployments that concern me.  The current scheme may see the 
clearinghouse become overloaded in a very large cluster.
>From what you have  said, a possible strategy may be to combine the listener 
>and worker into a single process, using the non-blocking bcast just for that 
>group, while each worker scanned its own port for an incoming request, which 
>it would in turn bcast to its peers.
As you have indicated though, this would depend on the load the non-blocking 
bcast would cause.  - At least the load would be fairly even over the cluster.

--- On Mon, 9/5/11, Jeff Squyres <jsquy...@cisco.com> wrote:

From: Jeff Squyres <jsquy...@cisco.com>
Subject: Re: [OMPI users] is there an equiv of iprove for bcast?
To: randolph_pul...@yahoo.com.au
Cc: "Open MPI Users" <us...@open-mpi.org>
Received: Monday, 9 May, 2011, 11:27 PM

On May 3, 2011, at 8:20 PM, Randolph Pullen wrote:

> Sorry, I meant to say:
> - on each node there is 1 listener and 1 worker.
> - all workers act together when any of the listeners send them a request.
> - currently I must use an extra clearinghouse process to receive from any of 
> the listeners and bcast to workers, this is unfortunate because of the 
> potential scaling issues
> 
> I think you have answered this in that I must wait for MPI-3's non-blocking 
> collectives.

Yes and no.  If each worker starts N non-blocking broadcasts just to be able to 
test for completion of any of them, you might end up consuming a bunch of 
resources for them (I'm *anticipating* that pending non-blocking collective 
requests maybe more heavyweight than pending non-blocking point-to-point 
requests).

But then again, if N is small, it might not matter.

> Can anyone suggest another way?  I don't like the serial clearinghouse 
> approach.

If you only have a few workers and/or the broadcast message is small and/or the 
broadcasts aren't frequent, then MPI's built-in broadcast algorithms might not 
offer much more optimization than doing your own with point-to-point 
mechanisms.  I don't usually recommend this, but it may be possible for your 
case.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Reply via email to