After reading Anthony's question again, I am not sure now that we are having the same problem, but we might. In any case, the attached example programs trigger the issue of running out of pipes. I don't see how orted could, even if it was reused. There is only a very limited number of processes running at any given time. Once slave terminates, how would it still have open pipes? Shouldn't the total number of open files, or pipes, be very limited in this situation? And yet, after maybe 20 or so iterations in master.c, orted complains about running out of pipes.
nick On Tue, Dec 1, 2009 at 16:08, Nicolas Bock <nicolasb...@gmail.com> wrote: > Hello list, > > a while back in January of this year, a user (Anthony Thevenin) had the > problem of running out of open pipes when he tried to use MPI_Comm_spawn a > few times. As I the thread his started in the mailing list archives and have > just joined the mailing list myself, I unfortunately can't reply to the > thread. "The thread was titled: Doing a lot of spawns does not work with > ompi 1.3 BUT works with ompi 1.2.7". > > The discussion stopped without really presenting a solution. Is the issue > brought up by Anthony fixed? We are running into the same problem. > > Thanks, nick > >
#include <stdio.h> #include <stdlib.h> #include <mpi.h> int main (int argc, char **argv) { int rank; int size; int *error_codes; int spawn_counter = 0; char *slave_argv[] = { "arg1", "arg2", 0 }; MPI_Comm spawn; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); if (rank == 0) { printf("[master] running on %i processors\n", size); while (1) { printf("[master] (%i) forking processes\n", spawn_counter++); error_codes = (int*) malloc(sizeof(int)*size); MPI_Comm_spawn("./slave", slave_argv, size, MPI_INFO_NULL, 0, MPI_COMM_SELF, &spawn, error_codes); printf("[master] waiting at barrier\n"); MPI_Barrier(spawn); free(error_codes); } } MPI_Finalize(); }
#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <mpi.h> #define SLEEP_TIME 2 int main (int argc, char **argv) { int rank; int size; MPI_Comm spawn; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("[slave %i] sleeping for %i seconds\n", rank, SLEEP_TIME); sleep(SLEEP_TIME); printf("[slave %i] waiting at barrier\n", rank); MPI_Comm_get_parent(&spawn); MPI_Barrier(spawn); MPI_Finalize(); }