On 6/28/19 9:18 AM, Valerio Bellizzomi wrote:
On Fri, 2019-06-28 at 08:51 +0200, Valerio Bellizzomi wrote:
On Thu, 2019-06-27 at 18:35 +0200, Valerio Bellizzomi wrote:
The nodes are now communicating however when I run the command
srun -w compute02 /bin/ls
it remains stuck and there is no output on the submit machine.
on the compute02 there is a Communication error and Timeout.
the network ports 6817 and 6818 are open.
Looking at the firewall logs, slurmctld wants to connect back to a range
of ports which are closed.
As a test I stopped the firewall service on the submit machine, now the
command above is working fine.
You may want to check your firewall settings according to Slurm's
requirements. I've summarized this in my Wiki page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-firewall-for-slurm-daemons
/Ole