"Klein, Dennis" writes:
> * Can I (and if yes, how can I) update the GRES count dynamically
> (The idea would be to monitor the revision changes on all cvmfs
> mountpoints with a simple daemon process on each worker node which
> then notifies slurm on a revision change)?
Perhaps the daemon pro
Hi;
At your partition definition, there is "Shared=NO". This is means "do
not share nodes between jobs". This parameter conflict with
"OverSubscribe=FORCE:12 " parameter. Acording to the slurm
documentation, the Shared parameter has been replaced by the
OverSubscribe parameter. But, I suppose
I remember that we had to add this to our /etc/ssh/sshd_config to get X11 to
work with Slurm 19.05
X11UseLocalhost no
We added this to our login nodes (where users ssh to), and then restarted the
ssh server. You would then need to log out and log back in with X11 forwarding
again.
Sean
--
Se
Hi Dennis,
I don't know how cvmfs works.
On Wed, Feb 26, 2020 at 06:40:23PM +, Klein, Dennis wrote:
> our slurm worker nodes mount several read-only software repositories
> via the cvmfs filesystem [1]. Each repository is versioned and each
> cvmfs mountpoint automatically switches to serving
Hi folks,
does anyone know how to detect in the lua submission script, if the user
used --mem or --mem-per-cpu?
And also, if it is possible to "unset" this setting?
The reason is, we want to remove all memory thingies set by the user for
exclusive jobs.
Best
Marcus
--
Marcus Wagner, Dipl.
Hi James,
Slurm 19.05.5 face the same problem with Centos8, so hardened environment, and
the same fix helps. I will test slurm 20.02 as soon as possible.
https://bugs.schedmd.com/show_bug.cgi?id=8414
Best Regards,
Stephan
-Original Message-
From: slurm-users [mailto:slurm-users-boun.
Marcus Wagner writes:
> does anyone know how to detect in the lua submission script, if the
> user used --mem or --mem-per-cpu?
>
> And also, if it is possible to "unset" this setting?
Something like this should work:
if job_desc.pn_min_memory ~= slurm.NO_VAL64 then
-- --mem or --mem-per-cpu
Hi Sean,
Thank you for your reply.
I will test it asap.
Best regards,
Pär Lundö
From: slurm-users On Behalf Of Sean
Crosby
Sent: den 27 februari 2020 10:26
To: Slurm User Community List
Subject: Re: [slurm-users] Slurm 19.05 X11-forwarding
I remember that we had to add this to our /etc/ssh/ss
Is it possible to dynamically change JobFileAppend/open-mode behavior? I’m
using EpilogSlurmctld to automatically requeue jobs that exit with a certain
code, and would like to have those append rather than overwrite, but it seems
blunt to set `JobFileAppend=1` and force people who want the defau
We figured out the issue.
All of our jobs are requesting 1 GPU. Each node only has 1 GPU. Thus, the
jobs that are pending are pending based on:, resources - meaning "no
resources are available for these jobs", meaning "I want a GPU, but there
are no GPUs that I can use until a job on a node finish
If that 32 GB is main system RAM, and not GPU RAM, then yes. Since our GPU
nodes are over-provisioned in terms of both RAM and CPU, we end up using the
excess resources for non-GPU jobs.
If that 32 GB is GPU RAM, then I have no experience with that, but I suspect
MPS would be required.
> On Fe
Hi all,
Looks like using --config-server limits to 1 config server if I'm not
mistaken?
Specifying multiple --config-server will cause slurmd to consider only
the last one.
(A quick glance at the source seems to agree)
Any plan on accepting a second server via command line options?
Thanks & r
This is a code level question.
I'm writing a select plugin and I want the plugin to take some action when
a job is going to be or has been queued instead of run immediately. Does
one of the select plugin APIs get called in either case?
I was trying to check for this in select_p_job_test() but it
>
> If that 32 GB is main system RAM, and not GPU RAM, then yes. Since our GPU
> nodes are over-provisioned in terms of both RAM and CPU, we end up using
> the excess resources for non-GPU jobs.
>
No it's GPU RAM
> If that 32 GB is GPU RAM, then I have no experience with that, but I
> suspect MP
On 2/27/20 11:23 AM, Robert Kudyba wrote:
OK so does SLURM support MPS and if so what version? Would we need to
enable cons_tres and use, e.g., --mem-per-gpu?
Slurm 19.05 (and later) supports MPS - here's the docs from the most
recent release of 19.05:
https://slurm.schedmd.com/archive/slur
I'm setting up an EC2 SLURM cluster and when an instance doesn't resume fast
enough I get an error like:
node c7-c5-24xl-464 not resumed by ResumeTimeout(600) - marking down and
power_save
I keep running into issues where my cloud nodes do not show up in sinfo and I
can't display their informa
Hello,
I have a hybrid cluster with 2 GPUs and 2 20-cores CPUs on each node.
I created two partitions: - "cpu" for CPU-only jobs which are allowed to
allocate up to 38 cores per node - "gpu" for GPU-only jobs which are
allowed to allocate up to 2 GPUs and 2 CPU cores.
Respective sections in slur
17 matches
Mail list logo