Hi,
What are potential bad side effects of using a large/larger MessageTimeout?
And is there a value at which this setting is too large (long)?
Thanks,
Herc
Hi,
We have a job that ran for 8 seconds, then failed with the Reason
showing as AssocMaxJobsLimit. In our case we have MaxJobs for each user
set to 5000. My understanding was that if the user submitted > 5000
jobs, slurm would only run 5000. The other jobs would just wait.
If that's corre
ing to something else entirely, could you
elaborate on the least-loaded configuration in your setup?
On 24/02/2022 23:35:30, Herc
Silverstein wrote: cite="mid:3145b0e8-6ae0-f233-5080-36cdbba66...@schrodinger.com">
Hi,
Hi,
We would like to do over-subscription on a cluster that's running in the
cloud. The cluster dynamically spins up and down cpu nodes as needed.
What we see is that the least-loaded algorithm causes the maximum number
of nodes specified in the partition to be spun up and each loaded with N
Hi,
Is there a way to use task affinity on a per-partition basis? We
couldn't find anything in the docs that described doing this. And our
attempts to specify this on a per partition basis failed.
Thanks,
Herc
Hi,
The slurmctld.log shows (for this node):
...
[2021-05-25T00:12:27.481] sched: Allocate JobId=3402729
NodeList=gpu-t4-4x-ondemand-44 #CPUs=1 Partition=gpu-t4-4x-ondemand
[2021-05-25T00:12:27.482] sched: Allocate JobId=3402730
NodeList=gpu-t4-4x-ondemand-44 #CPUs=1 Partition=gpu-t4-4x-ondem
Hi,
We have a cluster (in Google gcp) which has a few partitions set up to
auto-scale, but one partition is set up to not autoscale. The desired
state is for all of the nodes in this non-autoscaled partition
(SuspendExcParts=gpu-t4-4x-ondemand) to continue running uninterrupted.
However, we
:27 PM, mercan wrote:
Hi;
Prolog and TaskProlog are different parameters and scripts. You should
use the TaskProlog script to set env. variables.
Regards;
Ahmet M.
13.02.2021 00:12 tarihinde Herc Silverstein yazdı:
Hi,
I have a prolog script that is being run via the slurm.conf Prolog=
setting.
Hi,
I have a prolog script that is being run via the slurm.conf Prolog=
setting. I've verified that it's being executed on the compute node.
My problem is that I cannot get environment variables that I set in this
prolog to be set/seen in the job. For example the prolog:
#!/bin/bash
...