Good morning, I want to run several MPI jobs with several processes each (i.e. 2 MPI jobs with 2 processes each) using different inputs. The jobs are independent between each other. I am running on one node with 8 cores and 1 slot for each core.
I am running the jobs using a Python GUI. The gui launches a bash script called "PROGRAM_parallel.bin". The script reads MPI options like number of processes and oversubscribe flag and then calls "mpirun $NP $OVERS separateStdout.sh BINARY.bin INPUTS" where separateStdout.sh is a bash script I'm using to separate the stdout from the different processes, $NP and $OVERS are the number of processes and the oversubscribe flag (user-configurable), BINARY.bin is the fortran compiled binary and INPUTS is a file containing all the necessary inputs for the binary. The GUI creates a "scenario/workspace" structure, so if I launch the job from two different scenarios or workspaces, I can use different inputs. I cannot use the option --cpu-set because the gui has no knowledge of the cores used by the first instance of mpirun and of the available slots in general. The problem I am having is that only one of the jobs terminates correctly, the others are killed (the gui returns 137). It seems that the jobs are sharing the same resources so when the first jobs finishes and calls MPI_FINALIZE, the MPI execution environment is terminated and the other jobs are killed before they can finish. How can I solve this problem? The information from ompi_info --all and the mapping and allocation of the executions can be found at this link: https://drive.google.com/drive/folders/1xy1wGI4dHzGbAJrU5oEzckVeVYWMUi1O?usp=sharing Thank you in advance for your help. Kind regards, Max.