Thanks, The messages are small and frequent (they flash metadata across the cluster). The current approach works fine for small to medium clusters but I want it to be able to go big. Maybe up to several hundred or even a thousands of nodes.
Its these larger deployments that concern me. The current scheme may see the clearinghouse become overloaded in a very large cluster. >From what you have said, a possible strategy may be to combine the listener >and worker into a single process, using the non-blocking bcast just for that >group, while each worker scanned its own port for an incoming request, which >it would in turn bcast to its peers. As you have indicated though, this would depend on the load the non-blocking bcast would cause. - At least the load would be fairly even over the cluster. --- On Mon, 9/5/11, Jeff Squyres <jsquy...@cisco.com> wrote: From: Jeff Squyres <jsquy...@cisco.com> Subject: Re: [OMPI users] is there an equiv of iprove for bcast? To: randolph_pul...@yahoo.com.au Cc: "Open MPI Users" <us...@open-mpi.org> Received: Monday, 9 May, 2011, 11:27 PM On May 3, 2011, at 8:20 PM, Randolph Pullen wrote: > Sorry, I meant to say: > - on each node there is 1 listener and 1 worker. > - all workers act together when any of the listeners send them a request. > - currently I must use an extra clearinghouse process to receive from any of > the listeners and bcast to workers, this is unfortunate because of the > potential scaling issues > > I think you have answered this in that I must wait for MPI-3's non-blocking > collectives. Yes and no. If each worker starts N non-blocking broadcasts just to be able to test for completion of any of them, you might end up consuming a bunch of resources for them (I'm *anticipating* that pending non-blocking collective requests maybe more heavyweight than pending non-blocking point-to-point requests). But then again, if N is small, it might not matter. > Can anyone suggest another way? I don't like the serial clearinghouse > approach. If you only have a few workers and/or the broadcast message is small and/or the broadcasts aren't frequent, then MPI's built-in broadcast algorithms might not offer much more optimization than doing your own with point-to-point mechanisms. I don't usually recommend this, but it may be possible for your case. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/