And we have ignition - thank you very much!
:-)
On Mon, May 11, 2020 at 8:44 PM Alex Chekholko wrote:
> Any time a node goes into DRAIN state you need to manually intervene and
> put it back into service.
> scontrol update nodename=ip-172-31-80-232 state=resume
>
> On Mon, May 11, 2020 at 11:40
Any time a node goes into DRAIN state you need to manually intervene and
put it back into service.
scontrol update nodename=ip-172-31-80-232 state=resume
On Mon, May 11, 2020 at 11:40 AM Joakim Hove wrote:
>
> You’re on the right track with the DRAIN state. The more specific answer
>> is in the
> You’re on the right track with the DRAIN state. The more specific answer
> is in the “Reason=” description on the last line.
>
> It looks like your node has less memory than what you’ve defined for the
> node in slurm.conf
>
Thank you; that sounded meaningful to me. My slurm.conf file had
RealMe
You’re on the right track with the DRAIN state. The more specific answer is in
the “Reason=” description on the last line.
It looks like your node has less memory than what you’ve defined for the node
in slurm.conf
Mike
From: slurm-users on behalf of Joakim
Hove
Reply-To: Slurm User
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
debug* up infinite 1 drain ip-172-31-80-232
● slurmd.service - Slurm node daemon
Loaded: loaded (/lib/systemd/system/slurmd.service; enabled; vendor
preset: enabled)
Active: ac
ubuntu@ip-172-31-80-232:/var/run/slurm-llnl$ scontrol show node
NodeName=ip-172-31-80-232 Arch=x86_64 CoresPerSocket=1
CPUAlloc=0 CPUErr=0 CPUTot=1 CPULoad=0.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=ip-172-31-80-232 NodeHostName=ip-172-31-80-232 Version=
You will want to look at the output of 'sinfo' and 'scontrol show node' to
see what slurmctld thinks about your compute nodes; then on the compute
nodes you will want to check the status of the slurmd service ('systemctl
status -l slurmd') and possibly read through the slurmd logs as well.
On Mon,
Hello;
I am in the process of familiarizing myself with slurm - I will write a
piece of software which will submit jobs to a slurm cluster. Right now I
have just made my own "cluster" consisting of one Amazon AWS node and use
that to familiarize myself with the sxxx commands - has worked nicely.
Overzealous node cleanup epilog script?
> On 11 May 2020, at 17:56, Alastair Neil wrote:
>
>
> Hi there,
>
> We are using slurm 18.08 and had a weird occurrence over the weekend. A user
> canceled one of his jobs using scancel, and two additional jobs of the user
> running on the same nod
Hi there,
We are using slurm 18.08 and had a weird occurrence over the weekend. A
user canceled one of his jobs using scancel, and two additional jobs of the
user running on the same node were killed concurrently. The jobs had no
dependency, but they were all allocated 1 gpu. I am curious to kno
Previous versions of mysql are suppose to have nasty security issues.
I'm not sure why I had mysql instead of mariadb anyway.
On Mon, May 11, 2020 at 9:29 AM Relu Patrascu wrote:
>
> We've experienced the same problem on several versions of slurmdbd
> (18, 19) so we downgraded mysql and put a ho
Hi Martijn,
I'm sorry that it took me several weeks to get back to this issue
- never fix anything that isn't broken (... too much), and I've
been busy with user accounting all over the place...
On Thu, 2020-03-05 at 12:03:38 +0100, Martijn Kruiten wrote:
> Hi Steffen,
>
> We are using Slurm on
We've experienced the same problem on several versions of slurmdbd
(18, 19) so we downgraded mysql and put a hold on the package.
Hey Dustin, funny we meet here :)
Relu
On Tue, May 5, 2020 at 3:43 PM Dustin Lang wrote:
>
> I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentati
13 matches
Mail list logo