Re: [slurm-users] Debian dist-upgrade?

2023-04-26 Thread Steffen Grunewald
Hi all,

after several delays, we're done with the move now.

On Tue, 2023-02-07 at 14:54:55 +0100, Steffen Grunewald wrote:
> Hi Loris,
> 
> On Tue, 2023-01-24 at 16:48:26 +0100, Loris Bennett wrote:
> > Hi Steffen,
> > 
> > Could you create/find a deb-package for a Slurm 19.x version to use as
> > an intermediate?  Never having built such a package, I don't now how
> > much trouble that would be.
> 
> Actually, I found a 20.02.6 version that spent quite some time in the
> Debian repositories (that would be two major releases after 18.08.5
> and therefore should be OK).
> I was able to build Debian packages for Buster from that source package,
> and *might* be able to use the dbd package to obtain a database dump
> that could be further processed by Bullseye's proper slurmdbd (20.11.4).

We're now done with the procedure, mainly following "the reference"
(i.e. Ole's "Slurm database" wiki page).

Doing so, with the intermediate 20.02 release built for Buster and installed
over the 18.08 one provided by the Debian repositories, we found that it's
*not* sufficient to just upgrade the DB (by starting the intermediate
slurmdbd once) but, to get a consistent state for slurmctld, also to run
the intermediate slurmctld before saving and copying the SlurmSaveState
tree.

After that, getting 20.11 running on the new ctld/dbd machine was no
major problem; maybe we find one when querying the history at the beginning
of the next month.
We just failed to use an alias name ("slurmmaster") for the new machine,
since Slurm checks the FQDN returned by `hostname -f` and we didn't want
to change that.
(Personally I think this reaction is a bit "overmotivated", and perhaps
can be addressed or worked around in another way - until we can setup a
dedicated machine for Slurm purposes. But with the "real" name it works.)


Thanks to all who provided suggestions!

Best,
 Steffen

-- 
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~



[slurm-users] gres.conf AutoDetect=rsmi

2023-04-26 Thread Hagdorn, Magnus Karl Moritz
Hi there,
I eventually managed to get the gres AutoDetect=rsmi to sort of work.
It correctly found the AMD GPU, a Instinct MI210. I am using ROCm 5.4.
Unfortunately, the reported gpu type is "Instinct MI210" which I can't
configure in the GresType parameter in the slurm.conf because of the
space in the type. I went back to manually configuring the resource in
gres.conf. Has anybody managed to get this to work?
Regards
magnus

-- 
Magnus Hagdorn
Charité – Universitätsmedizin Berlin
Geschäftsbereich IT | Scientific Computing
 
Campus Charité Virchow Klinikum
Forum 4 | Ebene 02 | Raum 2.020
Augustenburger Platz 1
13353 Berlin
 
magnus.hagd...@charite.de
https://www.charite.de
HPC Helpdesk: sc-hpc-helpd...@charite.de


smime.p7s
Description: S/MIME cryptographic signature