Hello,
I have the database on a separate server (it runs the database and the
database only). The login nodes run nothing SLURM related, they simply
have the binaries installed & a SLURM config.
I've never looked into having multiple databases & using
AccountingStorageExternalHost (in fact I'd forgotten you could do that),
so I can't comment on that (maybe someone else can); I think that works,
yes, but as I said never tested that (didn't see much point in running
multiple databases if one would do the job).
I actually have specific login nodes for both of my clusters, to make it
easier for users (especially those with not much experience using the
HPC environment); so I have one login node connecting to cluster 1 and
one connecting to cluster 1.
I think the relevant bits of slurm.conf Relevant config entries (if I'm
not mistaken) on the login nodes are probably:
The differences in the slurm config files (that haven't got to do with
topology & nodes & scheduler tuning) are
ClusterName=cluster1
ControlMachine=cluster1-slurm
ControlAddr=/IP_OF_SLURM_CONTROLLER/
ClusterName=cluster2
ControlMachine=cluster2-slurm
ControlAddr=/IP_OF_SLURM_CONTROLLER/
(where IP_OF_SLURM_CONTROLLER is the IP address of host cluster1-slurm,
same for cluster2)
And then the have common entries for the AccountingStorageHost:
AccountingStorageHost=slurm-db-prod
AccountingStorageBackupHost=slurm-db-prod
AccountingStoragePort=7030
AccountingStorageType=accounting_storage/slurmdbd
(slurm-db-prod is simply the hostname of the SLURM database server)
Does that help?
Tina
On 28/10/2021 14:59, navin srivastava wrote:
Thank you Tina.
so if i understood correctly.Database is global to both cluster and
running on login Node?
or is the database running on one of the master Node and shared with
another master server Node?
but as far I have read that the slurm database can also be separate on
both the master and just use the parameter
AccountingStorageExternalHost so that both databases are aware of each
other.
Also on the login node in slurm .conf file pointed to which Slurmctld?
is it possible to share the sample slurm.conf file of login Node.
Regards
Navin.
On Thu, Oct 28, 2021 at 7:06 PM Tina Friedrich
<tina.friedr...@it.ox.ac.uk <mailto:tina.friedr...@it.ox.ac.uk>> wrote:
Hi Navin,
well, I have two clusters & login nodes that allow access to both. That
do? I don't think a third would make any difference in setup.
They need to share a database. As long as the share a database, the
clusters have 'knowledge' of each other.
So if you set up one database server (running slurmdbd), and then a
SLURM controller for each cluster (running slurmctld) using that one
central database, the '-M' option should work.
Tina
On 28/10/2021 10:54, navin srivastava wrote:
> Hi ,
>
> I am looking for a stepwise guide to setup multi cluster
implementation.
> We wanted to set up 3 clusters and one Login Node to run the job
using
> -M cluster option.
> can anybody have such a setup and can share some insight into how it
> works and it is really a stable solution.
>
>
> Regards
> Navin.
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems
Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk <http://www.arc.ox.ac.uk>
http://www.it.ox.ac.uk <http://www.it.ox.ac.uk>
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk