Hi, MPI_Comm_spawn() is failing with the error message "All nodes which are allocated for this job are already filled". I compiled OpenMpi 4.0.1 with the Portland Group C++ compiler, v. 19.5.0, both with and without Torque/Maui support. I thought that not using Torque/Maui support would give me finer control over where MPI_Comm_spawn() places the processes, but the failure message was the same in either case. Perhaps Torque is interfering with process creation somehow?
For the pared-down test code, I am following the instructions here to make mpiexec create exactly one manager process on a remote node, and then forcing that manager to spawn one worker process on the same remote node: https://stackoverflow.com/questions/47743425/controlling-node-mapping-of-mpi-comm-spawn Here is the full error message. Note the Max Slots: 0 message therein (?): Data for JOB [39020,1] offset 0 Total slots allocated 22 ======================== JOB MAP ======================== Data for node: n001 Num slots: 2 Max slots: 2 Num procs: 1 Process OMPI jobid: [39020,1] App: 0 Process rank: 0 Bound: N/A ============================================================= Data for JOB [39020,1] offset 0 Total slots allocated 22 ======================== JOB MAP ======================== Data for node: n001 Num slots: 2 Max slots: 0 Num procs: 1 Process OMPI jobid: [39020,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/././././././././.][./././././././././.] ============================================================= -------------------------------------------------------------------------- All nodes which are allocated for this job are already filled. -------------------------------------------------------------------------- [n001:08114] *** An error occurred in MPI_Comm_spawn [n001:08114] *** reported by process [2557214721,0] [n001:08114] *** on communicator MPI_COMM_SELF [n001:08114] *** MPI_ERR_SPAWN: could not spawn processes [n001:08114] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [n001:08114] *** and potentially your MPI job) Here is my mpiexec command: mpiexec --display-map --v --x DISPLAY -hostfile MyNodeFile --np 1 -map-by ppr:1:node SpawnTestManager Here is my hostfile "MyNodeFile": n001.cluster.com slots=2 max_slots=2 Here is my SpawnTestManager code: <code> #include <iostream> #include <string> #include <cstdio> #ifdef SUCCESS #undef SUCCESS #endif #include "/opt/openmpi_pgc_tm/include/mpi.h" using std::string; using std::cout; using std::endl; int main(int argc, char *argv[]) { int rank, world_size; char *argv2[2]; MPI_Comm mpi_comm; MPI_Info info; char host[MPI_MAX_PROCESSOR_NAME + 1]; int host_name_len; string worker_cmd = "SpawnTestWorker"; string host_name = "n001.cluster.com"; argv2[0] = "dummy_arg"; argv2[1] = NULL; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Get_processor_name(host, &host_name_len); cout << "Host name from MPI_Get_processor_name is " << host << endl; char info_str[64]; sprintf(info_str, "ppr:%d:node", 1); MPI_Info_create(&info); MPI_Info_set(info, "host", host_name.c_str()); MPI_Info_set(info, "map-by", info_str); MPI_Comm_spawn(worker_cmd.c_str(), argv2, 1, info, rank, MPI_COMM_SELF, &mpi_comm, MPI_ERRCODES_IGNORE); MPI_Comm_set_errhandler(mpi_comm, MPI::ERRORS_THROW_EXCEPTIONS); std::cout << "Manager success!" << std::endl; MPI_Finalize(); return 0; } </code> Here is my SpawnTestWorker code: <code> #include "/opt/openmpi_pgc_tm/include/mpi.h" #include <iostream> int main(int argc, char *argv[]) { int world_size, rank; MPI_Comm manager_intercom; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &world_size); MPI_Comm_get_parent(&manager_intercom); MPI_Comm_set_errhandler(manager_intercom, MPI::ERRORS_THROW_EXCEPTIONS); std::cout << "Worker success!" << std::endl; MPI_Finalize(); return 0; } </code> My config.log can be found here: https://gist.github.com/kmccall882/e26bc2ea58c9328162e8959b614a6fce.js I've attached the other info requested at on the help page, except the output of "ompi_info -v ompi full --parsable". My version of ompi_info doesn't accept the "ompi full" arguments, and the "-all" arg doesn't produce much output. Thanks for your help, Kurt
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users