[OMPI users] Multiple mpirun instances crash

Massimiliano Carcia via users Wed, 08 Feb 2023 05:26:58 -0800

Good morning,

I want to run several MPI jobs with several processes each (i.e. 2 MPI jobs 
with 2 processes each) using different inputs. The jobs are independent between 
each other. I am running on one node with 8 cores and 1 slot for each core.


I am running the jobs using a Python GUI. The gui launches a bash script called 
"PROGRAM_parallel.bin". The script reads MPI options like number of processes 
and oversubscribe flag and then calls "mpirun $NP $OVERS separateStdout.sh 
BINARY.bin INPUTS" where separateStdout.sh is a bash script I'm using to 
separate the stdout from the different processes, $NP and $OVERS are the number 
of processes and the oversubscribe flag (user-configurable), BINARY.bin is the 
fortran compiled binary and INPUTS is a file containing all the necessary 
inputs for the binary.
The GUI creates a "scenario/workspace" structure, so if I launch the job from 
two different scenarios or workspaces, I can use different inputs. I cannot use 
the option --cpu-set because the gui has no knowledge of the cores used by the 
first instance of mpirun and of the available slots in general.

The problem I am having is that only one of the jobs terminates correctly, the 
others are killed (the gui returns 137). It seems that the jobs are sharing the 
same resources so when the first jobs finishes and calls MPI_FINALIZE, the MPI 
execution environment is terminated and the other jobs are killed before they 
can finish. How can I solve this problem?

The information from ompi_info --all  and the mapping and allocation of the 
executions can be found at this link:
https://drive.google.com/drive/folders/1xy1wGI4dHzGbAJrU5oEzckVeVYWMUi1O?usp=sharing

Thank you in advance for your help.
Kind regards,

Max.

[OMPI users] Multiple mpirun instances crash

Reply via email to