Dear Users, our FORTRAN based code uses shell operations via either the system() function or calling the corresponding system subroutine. This fails with Slurm in certain cases. For instance, input files are manipulated as istatus=system('cp file1 file2').
When using Slurm scheduler system() returns -1 (or 255 if unsigned), and the requested shell operation is not performed if the system() call follows a malloc() operation allocating more than half of the memory that is available for the Slurm job. Unfortunately, there is no error message in the error output or the slurm output file. Our code, via a C interface, allocates all the available memory at once via that single malloc() operation and works with that allocated array during the entire runtime. All system() function calls which precede malloc() are performed correctly, and all system() function calls fail starting from right after malloc(). If less than half of the slurm job's memory limit is allocated with malloc() then all system() function calls are performed perfectly. I have tried to set the memory limit either by --mem-per-cpu or by --mem. I also tried --mem=0 together with --exclusive. I have tried different clusters with slurm versions of 14.03.9 and 17.11.12 and several well working FORTRAN compilers and found the same error consistently. Performing shell operations with system() also works perfectly on the same node with full memory without a scheduler. There is no problem either with SGE, OAR, or condor schedulers irrespective of the allocated memory size. Our guess is that there might be a Slurm specific setting which does not allow to fork a shell/child process if more than half of the memory limit is consumed by the parent job. Slurm might assume that the child process needs the same amount of memory as the parent and cancels it due to the slurm job's memory limit. Unfortunately, I did not find any error message or related error reports and got stuck here. Could you, please, help with suggestions how could we utilize the memory up to the slurm job memory limit? Thank you very much in advance, Peter