One of the simple ways I have dealt with different configs is to symlink
/etc/slurm/slurm.conf to the appropriate file (eg: slurm-dev.conf and
slurm-prod.conf)
In fact, I use the symlink for my dev and nothing (configless) for prod.
Then I can change a running node to/from dev/prod by merely
creating/deleting the symlink and restarting slurmd.
Just an option that may work for you.
I also use separate repos for prod/dev when I am working on
packages/testing. I rather prefer that separation so I don't have
someone accidentally update to a package that is not production-ready.
Brian Andrus
On 1/4/2023 9:22 AM, Groner, Rob wrote:
We currently have a test cluster and a production cluster, all on the
same network. We try things on the test cluster, and then we gather
those changes and make a change to the production cluster. We're
doing that through two different repos, but we'd like to have a single
repo to make the transition from testing configs to publishing them
more seamless. The problem is, of course, that the test cluster and
production clusters have different cluster names, as well as different
nodes within them.
Using the include directive, I can pull all of the NodeName lines out
of slurm.conf and put them into %c-nodes.conf files, one for
production, one for test. That still leaves me with two problems:
* The clustername itself will still be a problem. I WANT the same
slurm.conf file between test and production...but the clustername
line will be different for them both. Can I use an env var in
that cluster name, because on production there could be a
different env var value than on test?
* The gres.conf file. I tried using the same "include" trick that
works on slurm.conf, but it failed because it did not know what
the "ClusterName" was. I think that means that either it doesn't
work for anything other than slurm.conf, or that the clustername
will have to be defined in gres.conf as well?
Any other suggestions of how to keep our slurm files in a single
source control repo, but still have the flexibility to have them run
elegantly on either test or production systems?
Thanks.