Hi David: On Wed, Jul 21, 2010 at 02:10:53PM -0400, David Ronis wrote: > I've got a mpi program on an 8-core box that runs in a master-slave > mode. The slaves calculate something, pass data to the master, and > then call MPI_Bcast waiting for the master to update and return some > data via a MPI_Bcast originating on the master. > > One of the things the master does while the slaves are waiting is to > make heavy use of fftw3 FFT routines which can support multi-threading. > However, for threading to make sense, the slaves on same physical > machine have to give up their CPU usage, and this doesn't seem to be the > case (top shows them running at close to 100%). Is there another MPI > routine that polls for data and then gives up its time-slice? > > Any other suggestions?
I ran into a similar problem some time ago. My situation seems similar to yours: 1. the data in the MPI application has a to-and-fro nature. 2. I cannot afford an MPI process that consumes 100% cpu while doing nothing. My solution was to link two extra routines with my (FORTRAN) application. These routines intercept mpi_recv and mpi_send, test the status of the request, and sleep if it is not ready. The sleep time has an exponential curve; it has a start value, factor, and maximum value. I made no source code changes to my application. When I include these two routines at link time, the load from the application changes from 2.0 to 1.0 I use these with OpenMPI-1.2.8. I have not tried -mca yield_when_idle 1; which may not be in 1.2.8. Not sure. Hope that helps Douglas. -- Douglas Guptill voice: 902-461-9749 Research Assistant, LSC 4640 email: douglas.gupt...@dal.ca Oceanography Department fax: 902-494-3877 Dalhousie University Halifax, NS, B3H 4J1, Canada
/* * Intercept MPI_Recv, and * call PMPI_Irecv, loop over PMPI_Request_get_status and sleep, until done * * Revision History: * 2008-12-17: copied from MPI_Send.c * 2008-12-18: tweaking. * * See MPI_Send.c for additional comments, * especially w.r.t. PMPI_Request_get_status. **/ #include "mpi.h" #define _POSIX_C_SOURCE 199309 #include <time.h> int MPI_Recv(void *buff, int count, MPI_Datatype datatype, int from, int tag, MPI_Comm comm, MPI_Status *status) { int flag, nsec_start=1000, nsec_max=100000; struct timespec ts; MPI_Request req; ts.tv_sec = 0; ts.tv_nsec = nsec_start; PMPI_Irecv(buff, count, datatype, from, tag, comm, &req); do { nanosleep(&ts, NULL); ts.tv_nsec *= 2; ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec; PMPI_Request_get_status(req, &flag, status); } while (!flag); return (*status).MPI_ERROR; }
/* * Intercept MPI_Send, and * call PMPI_Isend, loop over PMPI_Request_get_status and sleep, until done * * Revision History: * 2008-12-12: skeleton by Jeff Squyres <jsquy...@cisco.com> * 2008-12-16->18: adding parameters, variable wait, * change MPI_Test to MPI_Request_get_status * Douglas Guptill <douglas.gupt...@dal.ca> **/ /* When we use this: * PMPI_Test(&req, &flag, &status); * we get: * dguptill@DOME:$ mpirun -np 2 mpi_send_recv_test_mine * This is process 0 of 2 . * This is process 1 of 2 . * error: proc 0 ,mpi_send returned -1208109376 * error: proc 1 ,mpi_send returned -1208310080 * 1 changed to 3 * * Using MPI_request_get_status cures the problem. * * A read of mpi21-report.pdf confirms that MPI_Request_get_status * is the appropriate choice, since there seems to be something * between the call to MPI_SEND (MPI_RECV) in my FORTRAN program * and MPI_Send.c (MPI_Recv.c) **/ #include "mpi.h" #define _POSIX_C_SOURCE 199309 #include <time.h> int MPI_Send(void *buff, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) { int flag, nsec_start=1000, nsec_max=100000; struct timespec ts; MPI_Request req; MPI_Status status; ts.tv_sec = 0; ts.tv_nsec = nsec_start; PMPI_Isend(buff, count, datatype, dest, tag, comm, &req); do { nanosleep(&ts, NULL); ts.tv_nsec *= 2; ts.tv_nsec = (ts.tv_nsec > nsec_max) ? nsec_max : ts.tv_nsec; PMPI_Request_get_status(req, &flag, &status); } while (!flag); return status.MPI_ERROR; }