Gilles; It works now. Thanks for pointing that out! Rick
From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles Gouaillardet Sent: Friday, September 30, 2016 8:55 AM To: Open MPI Users Subject: Re: [OMPI users] openmpi 2.1 large messages Rick, You must use the same value for root on all the tasks of the communicator. So the 4th parameter of MPI_Bcast should be hard-coded 0 instead of rank. Fwiw, with this test program If you MPI_Bcast a "small" message, then all your tasks send a message (that is never received) in eager mode, so MPI_Bcast completes If you MPI_Bcast a "long" message, then all your tasks send a message in rendezvous mode, and since no one receives it, MPI_Bcast hangs. "small" vs "long" depend on the interconnect and some tuning parameters, that can explain why 9000 bytes do not hang out of the box with an other Open MPI version. Bottom line, this test program is not doing what you expected. Cheers, Gilles On Friday, September 30, 2016, Marlborough, Rick <rmarlboro...@aaccorp.com<mailto:rmarlboro...@aaccorp.com>> wrote: Gilles; Thanks for your response. The network setup I have here is 20 computers connected over a 1 gig Ethernet lan. The computers are nehalems with 8 cores per. These are 64 bit machines. Not a high performance setup but this is simply a research bed. I am using a host file most of the time with each node configured for 10 slots. However, I see the same behavior if I run just 2 process instances on a single node. 8000 bytes are ok. 9000 bytes hangs. Here is my test code below. Maybe Im not setting this up properly. I just recently installed OpenMPI 2.1 and did not set any configuration flags. The OS we are using is a variation of RedHat 6.5 with 2.6.32 kernel. Thanks Rick #include "mpi.h" #include <stdio.h> #include <iostream> unsigned int bufsize = 9000; main(int argc, char *argv[]) { int numtasks, rank, dest, source, rc, count, tag=1; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); char * inmsg; std::cout << "Calling allocate" << std::endl; int x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &inmsg); std::cout << "Return code from input buffer allocation is " << x << std::endl; char * outmsg; x = MPI_Alloc_mem(bufsize,MPI_INFO_NULL, &outmsg); std::cout << "Return code from output buffer allocation is " << x << std::endl; MPI_Status Stat; // required variable for receive routines printf("Initializing on %d tasks\n",numtasks); MPI_Barrier(MPI_COMM_WORLD); if (rank == 0) { dest = 1; source = 1; std::cout << "Root sending" << std::endl; MPI_Bcast(outmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD); std::cout << "Root send complete" << std::endl; } else if (rank != 0) { dest = 0; source = 0; std::cout << "Task " << rank << " sending." << std::endl; MPI_Bcast(inmsg,bufsize, MPI_BYTE,rank,MPI_COMM_WORLD); std::cout << "Task " << rank << " complete." << std::endl; } MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); } From: users [mailto:users-boun...@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','users-boun...@lists.open-mpi.org');>] On Behalf Of Gilles Gouaillardet Sent: Thursday, September 29, 2016 7:58 PM To: Open MPI Users Subject: Re: [OMPI users] openmpi 2.1 large messages Rick, can you please provide some more information : - Open MPI version - interconnect used - number of tasks / number of nodes - does the hang occur in the first MPI_Bcast of 8000 bytes ? note there is a known issue if you MPI_Bcast with different but matching signatures (e.g. some tasks MPI_Bcast 8000 MPI_BYTE, while some other tasks MPI_Bcast 1 vector of 8000 MPI_BYTE) you might want to try mpirun --mca coll ^tuned and see if it helps Cheers, Gilles On 9/30/2016 6:52 AM, Marlborough, Rick wrote: Folks; I am attempting to set up a task that sends large messages via MPI_Bcast api. I am finding that small message work ok, anything less then 8000 bytes. Anything more than this then the whole scenario hangs with most of the worker processes pegged at 100% cpu usage. Tried some of the configuration settings from FAQ page, but these did not make a difference. Is there anything else I can try?? Thanks Rick _______________________________________________ users mailing list users@lists.open-mpi.org<javascript:_e(%7B%7D,'cvml','users@lists.open-mpi.org');> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users