Hi, I have installed a program on all nodes since it is an rpm. Therefore, when the program is running, it won't use the shared file system and it just use its own /usr/local/program files.
I also set a scratch path in the bashrc which is actually the path on the running node. For example, I set TMPFOLDER=/tmp/mahmood/program in the bashrc (home is shared), then I ssh to the node and create that path. Therefore, when the program wants to read/write some data during the execution it won't go through the network. Thing is that, when I directly ssh to the node and run the program with time command, I see real 7m34.738s However, when I submit the job via slurm on the head node, I see [mahmood@rocks7 g]$ sacct -X -j 66 --format=elapsed Elapsed ---------- 00:11:28 So, I think the slurm overhead is large (about 50%). Is that correct? How can I reduce that overhead? Regards, Mahmood