Re: [slurm-users] Is there split-brain danger when using backup slurmdbd?

2022-06-27 Thread Brian Andrus
This can happen if things are mis-configured. realize slurmdbd is merely a daemon that talks to a database. That database should be HA and separate from the slurmdbd systems. For our location, we have a central DB server that both slurmdbd systems point to. In this scenario, it is the db that e

[slurm-users] Is there split-brain danger when using backup slurmdbd?

2022-06-27 Thread taleintervenor
Hi, all: We noticed that slurmdbd provide the conf option DbdBackupHost for user to set a secondary slurmdbd node. Since slurmdbd is closely related to database, we wonder will multiple slurmdbd bring up the split-brain danger, which is the common topic in database high-available discussion. Wi

Re: [slurm-users] detailed worker state with sinfo

2022-06-27 Thread mercan
Hi; You can look the slurm code for information. https://github.com/SchedMD/slurm/blob/master/src/common/slurm_protocol_defs.c#L3838 The "ALLOCATED + DRAIN" and  "MIX + DRAIN" are same. Others are different. Also There are some other flags which can change status keywords. Regards; Ahmet M.

Re: [slurm-users] detailed worker state with sinfo

2022-06-27 Thread Guillaume COCHARD
Hello, You can use `scontrol show nodes` (for all nodes) or `scontrol show node ` (for one or more nodes). For example : ``` $ scontrol show node node0001 NodeName=node0001 Arch=x86_64 […] State=DOWN+DRAIN+NOT_RESPONDING […] ``` With that it would be quite easy to write a small scrip