Heavyweight solution (although if you have grafana and prometheus going
already a little less so):
https://github.com/rivosinc/prometheus-slurm-exporter
On Tue, Aug 20, 2024 at 12:40 AM Simon Andrews via slurm-users <
slurm-users@lists.schedmd.com> wrote:
> Possibly a bit more elaborate than you
Could this apply in your case:
https://slurm.schedmd.com/faq.html#opencl_pmix ?
On Wed, Nov 1, 2023 at 5:24 AM Paulo Jose Braga Estrela <
paulo.estr...@petrobras.com.br> wrote:
> Yeah, you are right. I don’t know why but it seems that my email client
> messed with message formatting putting all s
The general idea is to have priority batch partitions with preemptions that
can
occur for higher priority jobs (suspending the lower priority).
Also there's an interactive partition where users can run GUI tools that
can't be preempted.
This works fine up to the point that I would like to OverSubs
Nicolas,
It looks like for the partition named "test" you still have *PreemptMode=off
?*
On Wed, Mar 15, 2023 at 7:35 AM Wagner, Marcus
wrote:
> Hi Nicolas,
>
>
> sorry to say, but we have no experience with preemption.
>
>
> Best
>
> Marcus
>
>
> Am 14.03.2023 um 22:07 schrieb Nicolas Sonoda:
al syntax
> checker (https://bugs.schedmd.com/show_bug.cgi?id=3435).
>
> -Paul Edmon-
> On 1/27/23 2:36 PM, Kevin Broch wrote:
>
> I'm wondering what others use to lint their slurm.conf files to give more
> confidence that the changes are valid.
>
> I came across https:
I'm wondering what others use to lint their slurm.conf files to give more
confidence that the changes are valid.
I came across https://github.com/appeltel/slurmlint which was somewhat
functional
but since it hasn't been updated since 2019, when I ran it against a valid
slurm.conf file based on a l
memory.
>
> I tried also: REQUEUE,GANG and CANCEL,GANG.
>
> None of these options seems to be able to preempt GPU jobs
>
> On Fri, 13 Jan 2023 at 12:30, Kevin Broch wrote:
>
>> My guess, is that this isn't possible with GANG,SUSPEND. GPU memory
>> isn't
O ReqResv=NO
> OverSubscribe=NO
>OverTimeLimit=NONE PreemptMode=GANG,SUSPEND
> State=UP TotalCPUs=64 TotalNodes=1 SelectTypeParameters=NONE
>JobDefaults=DefCpuPerGPU=2
>DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
>
> On Fri, 13 Jan 2023 at 11:16, Kevin Broch wrote:
Problem might be that OverSubscribe is not enabled? w/o it, I don't
believe the time-slicing can be GANG scheduled
Can you do a "scontrol show partition" to verify that it is?
On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel wrote:
> Hi,
>
> I am trying to enable gang scheduling on a server with