hi everybody,
i try to user dynamic mode  with configless mode  with slurm 24.11.3 and
upgrade to  24.11.4

with slurmd and i found a problem.
slurmctld is a container with docker, and my node is outside the container
network .

slurmctld register my ip with a function getpeeraddr on the slurmctld
socket.
but my ip connected to the socket come from the docker nat/bridge so
slurmctld register my ip bridged ( not my real ip )  that is to say the
docker gateway (172.20.0.1)


*scontrol show node*
*-------------------------*


*NodeName=ltlsbubble1 Arch=x86_64 CoresPerSocket=4..NodeAddr=172.20.0.1
NodeHostName=ltlsbubble1 Version=24.11.4*


so the node go down after the "not pinging it"  timeout

i try to update the config
*scontrol uupdate NodeName=ltlsbubble1* *NodeAddr=xx.xx.xx.xx*

but a the first

*scontrol reconfigure *

it comes back to :  *NodeAddr=172.20.0.1*


in normal mode
-------------------



*scontrol show nodeNodeName=ltlsbubble1 Arch=x86_64
CoresPerSocket=4..NodeAddr=ltlsbubble1 NodeHostName=ltlsbubble1
Version=24.11.4*

in normal mode NodeAddr is the same than NodeName , so it use DNS
resolution for communication.


to verify my hypothesis,  i go to the c code of slurm,  identify the
register function and replace it  with the same mechanism than normal node

in src/slurmctld/node_mgr.c

i replace :
set_node_comm_name(node_ptr, *comm_name*, reg_msg->hostname);
by
set_node_comm_name(node_ptr, NULL, reg_msg->hostname);


i rebuild slutmctld with this patch and try it with dynamic mode , it works
like expected



*scontrol show nodeNodeName=ltlsbubble1 Arch=x86_64
CoresPerSocket=4..NodeAddr=ltlsbubble1 NodeHostName=ltlsbubble1
Version=24.1*

no ip in nodeAddr , but only the nodename, so it use DNS resolution . the
node works fine  and no goes down for timeout ping

so my question  :
can we have an option to force DNS resolution instead ip discover in
Dynamic mode  ?
( i try the option cloud_dns,    but it not seems the purpose of this option)


best regard,
Stephane
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to