Rick, You must use the same value for root on all the tasks of the communicator. So the 4th parameter of MPI_Bcast should be hard-coded 0 instead of rank.
Fwiw, with this test program If you MPI_Bcast a "small" message, then all your tasks send a message (that is never received) in eager mode, so MPI_Bcast completes If you MPI_Bcast a "long" message, then all your tasks send a message in rendezvous mode, and since no one receives it, MPI_Bcast hangs. "small" vs "long" depend on the interconnect and some tuning parameters, that can explain why 9000 bytes do not hang out of the box with an other Open MPI version. Bottom line, this test program is not doing what you expected. Cheers, Gilles On Friday, September 30, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com> wrote: > Gilles; > > Thanks for your response. The network setup I have here is > 20 computers connected over a 1 gig Ethernet lan. The computers are > nehalems with 8 cores per. These are 64 bit machines. Not a high > performance setup but this is simply a research bed. I am using a host file > most of the time with each node configured for 10 slots. However, I see the > same behavior if I run just 2 process instances on a single node. 8000 > bytes are ok. 9000 bytes hangs. Here is my test code below. Maybe Im not > setting this up properly. I just recently installed OpenMPI 2.1 and did not > set any configuration flags. The OS we are using is a variation of RedHat > 6.5 with 2.6.32 kernel. > > > > Thanks > > > > Rick > > > > #include "mpi.h" > > #include <stdio.h> > > #include <iostream> > > unsigned int bufsize = 9000; > > main(int argc, char *argv[]) { > > int numtasks, rank, dest, source, rc, count, tag=1; > > MPI_Init(&argc,&argv); > > MPI_Comm_size(MPI_COMM_WORLD, &numtasks); > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > char * inmsg; > > std::cout << "Calling allocate" << std::endl; > > int x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &inmsg); > > std::cout << "Return code from input buffer allocation is > " << x << std::endl; > > char * outmsg; > > x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &outmsg); > > std::cout << "Return code from output buffer allocation is > " << x << std::endl; > > MPI_Status Stat; // required variable for receive routines > > printf("Initializing on %d tasks\n",numtasks); > > MPI_Barrier(MPI_COMM_WORLD); > > if (rank == 0) { > > dest = 1; > > source = 1; > > std::cout << "Root sending" << > std::endl; > > MPI_Bcast(outmsg,bufsize, > MPI_BYTE,rank,MPI_COMM_WORLD); > > std::cout << "Root send complete" << > std::endl; > > } > > else if (rank != 0) { > > dest = 0; > > source = 0; > > std::cout << "Task " << rank << " > sending." << std::endl; > > MPI_Bcast(inmsg,bufsize, > MPI_BYTE,rank,MPI_COMM_WORLD); > > std::cout << "Task " << rank << " > complete." << std::endl; > > } > > MPI_Barrier(MPI_COMM_WORLD); > > MPI_Finalize(); > > } > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > <javascript:_e(%7B%7D,'cvml','users-boun...@lists.open-mpi.org');>] *On > Behalf Of *Gilles Gouaillardet > *Sent:* Thursday, September 29, 2016 7:58 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] openmpi 2.1 large messages > > > > Rick, > > > > can you please provide some more information : > > - Open MPI version > > - interconnect used > > - number of tasks / number of nodes > > - does the hang occur in the first MPI_Bcast of 8000 bytes ? > > > > note there is a known issue if you MPI_Bcast with different but matching > signatures > > (e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks MPI_Bcast > 1 vector of 8000 MPI_BYTE) > > you might want to try > mpirun --mca coll ^tuned > and see if it helps > > > Cheers, > > Gilles > > On 9/30/2016 6:52 AM, Marlborough, Rick wrote: > > Folks; > > I am attempting to set up a task that sends large messages > via MPI_Bcast api. I am finding that small message work ok, anything less > then 8000 bytes. Anything more than this then the whole scenario hangs with > most of the worker processes pegged at 100% cpu usage. Tried some of the > configuration settings from FAQ page, but these did not make a difference. > Is there anything else I can try?? > > > > Thanks > > Rick > > > > > _______________________________________________ > > users mailing list > > users@lists.open-mpi.org > <javascript:_e(%7B%7D,'cvml','users@lists.open-mpi.org');> > > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users