Hello Jonathan, Your are using Infinipath's PSM library and the corresponding MTL/psm and therefore the corresponding upper-layer PML/cm. In fact, this _is_ calling into the psm's irecv() function, which explains the error triggered in the psm library.
Not knowing the degree of parallelism of Your application otherwise, apart from trying to increase the max. recv requests using the environment variable, You might want to change some of the master send to synchronous MPI_Ssend(). On the other hand, the example code You posted could be written differently, e.g. collect multiple random numbers into one communication, or using collective communication, here with sub-communicators containing the master and sources and master and targets, all of which would reduce pressure on the master. Hope this helps. Best regards, Rainer On Sunday 07 March 2010 04:17:33 pm Jonathan Wesley Stone wrote: > Hi, > > My supercomputer has OpenMPI 1.4. I am running into a frustrating > problem with my MPI program. I am using only the following calls, > which I expect to be blocking: > MPI_Wtime > MPI_Error_string > MPI_Abort > MPI_Send > MPI_Get_count > MPI_Recv > MPI_Probe > MPI_Init > MPI_Comm_rank > MPI_Comm_size > MPI_Finalize > > Somehow I am getting this error when I do a large number of sequential > communications: "c002:2.0.Exhausted 1048576 MQ irecv request > descriptors, which usually indicates a user program error or > insufficient request descriptors (PSM_MQ_RECVREQS_MAX=1048576)" > > This seems counter-intuitive to me because I don't think I should be > using irecvs since I am wanting specifically to rely on the documented > blocking behavior of MPI_Recv (not MPI_Irecv, which I am not using). > > My main program is quite large, however I have managed to replicate > the irritating behavior in this much smaller program, which executes a > number of MPI_Send or MPI_Recv calls in a loop. The program's default > behaviour is to run 2,000,000 iterations. When I turn it up to > 20,000,000, after a short time it generates the PSM_MQ_RECVREQS_MAX > exception. > > I would appreciate if anyone could advise why it might be happening in > this "test" case -- basically what is going on that causes my > presumably blocking MPI_Recv calls to "accumulate" such a large number > of "irecv request descriptors", when I expect they should be blocking > and get immediately resolved and the count should go down when the > matching MPI_Send is posted. > > I appreciate your assistance. Thank you! > > Jonathan Stone > Research Assistant, U. Oklahoma > -- ------------------------------------------------------------------------ Rainer Keller, PhD Tel: +1 (865) 241-6293 Oak Ridge National Lab Fax: +1 (865) 241-4811 PO Box 2008 MS 6164 Email: kel...@ornl.gov Oak Ridge, TN 37831-2008 AIM/Skype: rusraink