Thank you for the answers.
is the RealMemory will be decided on the Total Memory value or total usable
memory value.
i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
deda1x1591:~ # free -g
total used free sharedbuffers cached
Mem:
If you run slurmd -C on the compute node, it should tell you what slurm
thinks the RealMemory number is.
Jeff
From: slurm-users on behalf of navin
srivastava
Sent: Friday, July 10, 2020 6:24 AM
To: Slurm User Community List
Subject: Re: [slurm-users] change
You could set up an dummy node that has the features that are not active
but not allow jobs to schedule to that node by setting it to DOWN. That
would be a hacky way of accomplishing this.
-Paul Edmon-
On 7/9/2020 7:15 PM, Raj Sahae wrote:
Hi all,
My apologies if this is sent twice. The fi
It's recommended to round RealMemory down to the next lower gigabyte
value to prevent nodes from entering a drain state after rebooting with
a bios- or kernel-update.
Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node
configuration"
Stephan
On 10.07.20 13:46, Sarlo, Jeffrey S wr
Thanks either I can use which slurmd -C gives because I see same set of
node giving different value.or I can also choose the available memory I
mean 251*1024
Regards
Navin
On Fri, Jul 10, 2020, 20:34 Stephan Roth wrote:
> It's recommended to round RealMemory down to the next lower gigabyte
Hi Brian and Paul,
You both sent me suggestions about using an offline dummy node with all
features set. Thanks for your ideas but this won’t work for me as it’s not
practical. We want to allow users to queue for all supported software versions
and that easily numbers in the thousands or tens o
Another option would be to use the license feature and just set licenses
to 0 when they aren't available.
-Paul Edmon-
On 7/10/2020 12:42 PM, Raj Sahae wrote:
Hi Brian and Paul,
You both sent me suggestions about using an offline dummy node with
all features set. Thanks for your ideas but t
Interesting, I had not read the Licenses feature docs but I will look through
that, thanks.
Raj Sahae | m. +1 (408) 230-8531
From: slurm-users on behalf of Paul
Edmon
Reply-To: Slurm User Community List
Date: Friday, July 10, 2020 at 10:09 AM
To: "slurm-users@lists.schedmd.com"
Subject: Re:
Hi Raj,
It sounds like you might be coming from a CI/CD pipeline setup, but just in
case you're not, would you consider something like Jenkins or Gitlab CI
instead of Slurm?
The users could create multi-stage pipelines, with the 'build' stage
installing the required software version, and then mul
Thank you very much Sean! Your proposed solution solved the problem.
I reckon it's not very efficient, but works for us.
M.
Hi Paddy,
Yes, this is a CI/CD pipeline. We currently use Jenkins pipelines but it has
some significant drawbacks that Slurm solves out of the box that make it an
attractive alternative.
You noted some of them already, like good real time queue management,
pre-emption, node weighting, high reso
Hey Raj,
To me this all sounds, at a high level, a job for some kind of lightweight
middleware on top of SLURM. E.g. makefiles or something like that. Where
each pipeline would be managed outside of slurm and would maybe submit a
job to install some software, then submit a job to run something o
Hi All,
I’ve got an intermittent situation with gpu nodes that sinfo says are available
and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled the nodes to
restart services but it hasn’t helped. Any suggestions for resolving this or
digging into it more deeply?
Thanks,
Janna
Janna
On Friday, 10 July 2020 3:34:44 PM PDT Janna Ore Nugent wrote:
> I’ve got an intermittent situation with gpu nodes that sinfo says are
> available and idle, but squeue reports as “ReqNodeNotAvail”. We’ve cycled
> the nodes to restart services but it hasn’t helped. Any suggestions for
> resolving
Hi Janna;
It sounds like a Arp cache table problem to me. If your slurm head node
can reachable ~1000 or more network devices (all connected network
cards, switches etc., even they are reachable by different ports of the
server), you need to increse some network settings at headnode and
serve
15 matches
Mail list logo