Hi Gérard,
happy new year!
slurm 21.08 is EOL and has many open CVEs. I would start right away with a
more current base, for example Ubuntu 24.04.
Otherwise I could also offer a PPA for 22.04:
https://launchpad.net/~staeglis/+archive/ubuntu/slurm-backports-24.05
Best,
Stefan
Am Donnerstag, 9
Hi Xaver,
we also had a similar problem with Slurm 21.08 (see thread "error: power_save
module disabled, NULL SuspendProgram").
Fortunately, we have not yet observed this since the upgrade to 23.02. But the
time period (about a month) is still too short to know if the problem is
really fixed
Hi,
since a half year we using the suspend/resume support for Slurm. This works
quite well but sometimes it breaks and no nodes are suspended or resumed
anymore.
In this case we see the following message in the log:
error: power_save module disabled, NULL SuspendProgram
A restart of slurmctld
u
using a UnkillableStepProgram?
Thank you :)
Best,
Stefan
Am Freitag, 20. Januar 2023, 05:59:19 CET schrieb Christopher Samuel:
> On 1/19/23 5:01 am, Stefan Staeglich wrote:
> > Hi,
>
> Hiya,
>
> > I'm wondering where the UnkillableStepProgram is actually executed.
>
Hi,
I'm wondering where the UnkillableStepProgram is actually executed. According
to Mike it has to be available on every on the compute nodes. This makes sense
only if it is executed there.
But the man page slurm.conf of 21.08.x states:
UnkillableStepProgram
Must be execut
Hi Karl,
do you've found a solution?
Best,
Stefan
Am Freitag, 8. Januar 2021, 23:14:34 CEST schrieb Karl Lovink:
> Hi Luke,
>
> Thanks it’s working now. Thanks. One last question, is it possible to create
> a non-expiring token. Yes, I know it is not secure
> Sincerely yours,
> Karl
>
>
Hi,
we want to allow specific users to drain nodes. This feature seems to be
implemented in the nonstop plugin. But this seems to be overkill of using only
this feature.
Is there any other plugin that implements this feature?
Best,
Stefan
--
Stefan Stäglich, Universität Freiburg, Institut f
Hi Mike,
thank you very much :)
Stefan
Am Montag, 7. Februar 2022, 16:50:54 CET schrieb Michael Robbert:
> They moved Arbiter2 to Github. Here is the new official repo:
> https://github.com/CHPC-UofU/arbiter2
>
> Mike
>
> On 2/7/22, 06:51, "slurm-users"
> wrote: Hi,
>
> I've just noticed tha
Hi Diego,
do you any new insights regarding this issue?
Best,
Stefan
Am Montag, 26. Oktober 2020, 14:48:17 CET schrieb Diego Zuccato:
> Il 22/10/20 12:56, Diego Zuccato ha scritto:
> > 2) Is the shared memory accounted as belonging to the process and
> > enforced accordingly by cgroups?
>
> Acc
Hi,
I've just noticed that the repository https://gitlab.chpc.utah.edu/arbiter2
seems is down. Does someone know more?
Thank you!
Best,
Stefan
Am Dienstag, 27. April 2021, 17:35:35 CET schrieb Prentice Bisbal:
> I think someone asked this same exact question a few weeks ago. The best
> solutio
Hi Prentice,
thanks for the hint. I'm evaluating this too.
Seems that arbiter doesn't distinguish between RAM that's used really and RAM
that's sused as cache only. Or is my impression wrong?
Best,
Stefan
Am Dienstag, 27. April 2021, 17:35:35 CEST schrieb Prentice Bisbal:
> I think someone ask
Hi,
for our monitoring system I want to query the account hierarchy. Is there a
better approach than to parse the output of
sacctmgr list account withasso -nP
?
Something like
sacctmgr list account parent=bla withasso -nP
doesn't work.
Best,
Stefan
--
Stefan Stäglich, Universität Freiburg
Hello,
is there a best practise for activating this feature (set
ConstrainDevices=yes)? Do I have restart the slurmds? Does this affects running
jobs?
We are using Slurm 19.05.
Best,
Stefan
Am Dienstag, 25. August 2020, 17:24:41 CEST schrieb Christoph Brüning:
> Hello,
>
> we're using cgroup
Hi Sven,
I think it makes more sense to adjust the config file
/etc/slurm-llnl/slurm.conf
and not the systemd units:
SlurmctldPidFile=/run/slurmctld.pid
SlurmdPidFile=/run/slurmd.pid
Best,
Stefan
Am Mittwoch, 17. März 2021, 19:16:38 CET schrieb Sven Duscha:
> Hi,
>
> I experience with SLURM s
Hi,
what's the current status of the checkpointing support in SLURM? There was a
CRIU plugin mentioned:
https://slurm.schedmd.com/SLUG16/ciemat-cr.pdf
But it doesn't exist in SLURM 19.05.5 on Ubuntu 20.04. And the manual page
mentions an OpenMPI plugin only.
Best,
Stefan
--
Stefan Stäglich,
Hi,
all except of /etc/ssl/certs/ca-certificates.crt is ignored. So I've copied it
to /usr/local/share/ca-certificates/ and run update-ca-certificates.
Now it's working :)
Best,
Stefan
Am Freitag, 14. August 2020, 11:42:04 CEST schrieb Stefan Staeglich:
> Hi,
>
>
Hi,
I try to setup the acct_gather plugin ProfileInfluxDB. Unfortunately our
influxdb server has a self-signed certificate only:
[2020-08-14T09:54:30.007] [46.0] error: acct_gather_profile/influxdb
_send_data: curl_easy_perform failed to send data (discarded). Reason: SSL
peer certificate or SS
Hi Will,
in this case it should no problem to upgrade directly to Ubuntu 20.04? It
ships 19.05, there is no 19.11.
Best,
Stefan
Am Montag, 16. März 2020, 15:41:56 CET schrieb Will Dennis:
> Hi Stefan,
>
> I have not been able to find any 18.08.x PPAs; I myself have backported the
> latest Debi
Hi Chris,
I'm not sure how this works. I'm not very experienced in QoS objects.
Have I to create two QoS objects a and b with UsageThreshold=0.1,Flags=
EnforceUsageThreshold / UsageThreshold=0.9? And I need two different accounts
A and B like Daniel suggested? Or can I use a single account?
Al
Hi,
we have some compute nodes paid by different project owners. 10% are owned by
project A and 90% are owned by project B.
We want to implement the following policy such that every certain time period
(e.g. two weeks):
- Project A doesn't use more than 10% of the cluster in this time period
-
20 matches
Mail list logo