Dear all, 

Does anyone have any clue on what the problem could be here? This seems to be a 
persistent problem present in all currently supported OpenMPI releases and 
indicates that there is a fundamental flaw in how OpenMPI handles dynamic 
process creation. 

Best wishes, 
Thomas Pak 


From: "Thomas Pak" <thomas....@maths.ox.ac.uk> 
To: users@lists.open-mpi.org 
Sent: Friday, 7 December, 2018 17:51:29 
Subject: [OMPI users] MPI_Comm_spawn leads to pipe leak and other errors 

Dear all, 

My MPI application spawns a large number of MPI processes using MPI_Comm_spawn 
over its total lifetime. Unfortunately, I have experienced that this results in 
problems for all currently supported OpenMPI versions (2.1, 3.0, 3.1 and 4.0). 
I have written a short, self-contained program in C (included below) that 
spawns child processes using MPI_Comm_spawn in an infinite loop, where each 
child process exits after writing a message to stdout. This short program leads 
to the following issues: 

In versions 2.1.2 (Ubuntu package) and 2.1.5 (compiled from source), the 
program leads to a pipe leak where pipes keep accumulating over time until my 
MPI application crashes because the maximum number of pipes has been reached. 

In versions 3.0.3 and 3.1.3 (both compiled from source), there appears to be no 
pipe leak, but the program crashes with the following error message: 
PMIX_ERROR: UNREACHABLE in file ptl_tcp_component.c at line 1257 

In version 4.0.0 (compiled from source), I have not been able to test this 
issue very thoroughly because mpiexec ignores the --oversubscribe command-line 
flag (as detailed in this GitHub issue [ 
https://github.com/open-mpi/ompi/issues/6130 | 
https://github.com/open-mpi/ompi/issues/6130 ] ). This prohibits the 
oversubscription of processor cores, which means that spawning additional 
processes immediately results in an error because "not enough slots" are 
available. A fix for this was proposed recently ( [ 
https://github.com/open-mpi/ompi/pull/6139 | 
https://github.com/open-mpi/ompi/pull/6139 ] ), but since the v4.0.x developer 
branch is being actively developed right now, I decided not go into it. 

I have found one e-mail thread on this mailing list about a similar problem ( [ 
https://www.mail-archive.com/users@lists.open-mpi.org/msg10543.html | 
https://www.mail-archive.com/users@lists.open-mpi.org/msg10543.html ] ). In 
this thread, Ralph Castain states that this is a known issue and suggests that 
it is fixed in the then upcoming v1.3.x release. However, version 1.3 is no 
longer supported and the issue has reappeared, hence this did not solve the 
issue. 

I have created a GitHub gist that contains the output from "ompi_info --all" of 
all the OpenMPI installations mentioned here, as well as the config.log files 
for the OpenMPI installations that I compiled from source: [ 
https://gist.github.com/ThomasPak/1003160e396bb88dff27e53c53121e0c | 
https://gist.github.com/ThomasPak/1003160e396bb88dff27e53c53121e0c ] . 

I have also attached the code for the short program that demonstrates these 
issues. For good measure, I have included it directly here as well: 

""" 
#include <stdio.h> 
#include <mpi.h> 

int main(int argc, char *argv[]) { 

// Initialize MPI 
MPI_Init(NULL, NULL); 

// Get parent 
MPI_Comm parent; 
MPI_Comm_get_parent(&parent); 

// If the process was not spawned 
if (parent == MPI_COMM_NULL) { 

puts("I was not spawned!"); 

// Spawn child process in loop 
char *cmd = argv[0]; 
char **cmd_argv = MPI_ARGV_NULL; 
int maxprocs = 1; 
MPI_Info info = MPI_INFO_NULL; 
int root = 0; 
MPI_Comm comm = MPI_COMM_SELF; 
MPI_Comm intercomm; 
int *array_of_errcodes = MPI_ERRCODES_IGNORE; 

for (;;) { 
MPI_Comm_spawn(cmd, cmd_argv, maxprocs, info, root, comm, 
&intercomm, array_of_errcodes); 

MPI_Comm_disconnect(&intercomm); 
} 

// If process was spawned 
} else { 

puts("I was spawned!"); 

MPI_Comm_disconnect(&parent); 
} 

// Finalize 
MPI_Finalize(); 

} 
""" 

Thanks in advance and best wishes, 
Thomas Pak 

_______________________________________________ 
users mailing list 
users@lists.open-mpi.org 
https://lists.open-mpi.org/mailman/listinfo/users 
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to