Hello again,
Angel de Vicente via slurm-users
writes:
> [...] I don't understand is why the first three submissions
> below do get stopped by sbatch while the last one happily goes through?
>
>>> ,
>>> | $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000
Hello,
Brian Andrus via slurm-users
writes:
> Unless you are using cgroups and constraints, there is no limit
> imposed.
[...]
> So your request did not exceed what slurm sees as available (1 cpu
> using 4GB), so it is happy to let your script run. I suspect if you
> look at the usage, you wil
Hello,
we found an issue with Slurm 24.05.1 and the MaxMemPerNode
setting. Slurm is installed in a single workstation, and thus, the
number of nodes is just 1.
The relevant sections in slurm.conf read:
,
| EnforcePartLimits=ALL
| PartitionName=short Nodes=. State=UP Default=YES Max
Hello Loris,
"Loris Bennett" writes:
> Did you ever find an example or write your own plugin which you could
> provide as a example?
I'm afraid not (though I didn't persevere, because for the moment we are
trying to encourage our users not to waste resources with a different
approach).
But, in
Hi Will,
Will Furnell - STFC UKRI writes:
> That does sound like an interesting solution – yes please would you be
> able to send me (or us if you’re willing to share it to the list)
> through some more information please?
>
> And thank you everyone else that has replied to my email – there’s
>
Hello Cristobal,
Cristóbal Navarro writes:
> Hello Angel and Community,
> I am facing a similar problem with a DGX A100 with DGX OS 6 (Based on
> Ubuntu 22.04 LTS) and Slurm 23.02.
> When I execute `slurmd` service, it status shows failed with the
> following information below.
> As of today, w
Hello,
Angel de Vicente writes:
> From my limited tests today, somehow in the interactive queue all seems
> OK now, but not so in the 'batch' queue. For example, I just submitted
> three jobs with different amount of CPUs per job (4, 8 and 16 processes
> respectively).
Hello,
Matthew Brown writes:
> Minimum memory required per allocated CPU. ... Note that if the job's
> --mem-per-cpu value exceeds the configured MaxMemPerCPU, then the
> user's limit will be treated as a memory limit per task
Ah, thanks, I should've read the documentation more carefully.
Hello,
I'm trying to get Slurm to control the memory used per CPU, but it does
not seem to enforce the MaxMemPerCPU option in slurm.conf
This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3.
Relevant configuration options:
,cgroup.conf
| AllowedRAMSpace=100
| ConstrainCores=yes
| Con
Hello Lorenzo,
Lorenzo Bosio writes:
> I'm developing a job submit plugin to check if some conditions are met before
> a job runs.
> I'd need a way to notify the user about the plugin actions (i.e. why its jobs
> was killed and what to do), but after a lot of research I could only write to
>
Hello,
I want to experiment with writing our own site factor plugin. In the
documentation I found the API details
(https://slurm.schedmd.com/site_factor.html), but it would be much
easier for me if I had some example site factor plugin to start with.
Do you know of any examples that can set me in
Hello,
trying to get some stats about a running job, I've realized that one of
the jobs is consistently failing with:
,
| sstat: error: slurm_receive_msgs: [[]:6818] failed: Socket timed out on
send/recv operation
| sstat: error: slurm_job_step_stat: unknown return given from .ll.ia
Hello,
Angel de Vicente writes:
> Any idea what could be going on or how to debug this?
As a follow-up, I found that this was due to the "smail" script (bundled
with the "seff" contributed package). I had to do a small modification
and mails are being now delivered norma
Hello,
recently I updated our Slurm version to 23.02.3 and I have now noticed
that jobs having the "mail-type" option as:
#SBATCH --mail-type=BEGIN,END
only send mail notification for the BEGIN step. This was previously
working for both BEGIN and END notifications (I believe it was OK with
versi
Hello,
David Gauchard writes:
> slurm-23.02 on ubuntu-20.04,
>
> seff is not working anymore:
perhaps it is something specific to 20.04? I'm on Ubuntu 22.04 and
slurm-23.02.1 here and no problems with seff, except that the memory
efficiency part seems broken (I always seem to get 0.00% efficien
Hello,
Bjørn-Helge Mevik writes:
>> A solution was suggested in
>> https://serverfault.com/questions/1090689/how-can-i-set-up-interactive-job-only-or-batch-job-only-partition-on-a-slurm-clu
>>> Interactive jobs have no script and job_desc.script will be empty /
>> not set.
>>
>> So maybe somethi
Hi,
Bjørn-Helge Mevik writes:
> Wouldn't it be simpler to just refuse too long interactive jobs in
> job_submit.lua?
Yes, I guess. I proposed the idea of having different partitions because
then the constraints are at the level of the partition, which is
probably easier to handle than modifying
Hi Marko,
Marko Markoc writes:
> Quick question. Is there a way to limit the runtime on a partition
> only for salloc ? I would like for batch jobs to have a default max
> runtime of the partition but interactive jobs to have shortened
> allowed runtime.
I'm also interested in this (in my case
Hello,
Angel de Vicente writes:
> ,
> | slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are:
> | 5:freezer:/
> | 3:cpuacct:/
> `
in the end I learnt that despite Ubuntu 22.04 reporting to be using
only cgroup V2, it was also using V1 and creating those mo
Hello,
Angel de Vicente writes:
> And hence my question.. because as I was saying in a previous mail,
> reading the documentation I understand that this is the standard way to
> do it, but right now I got it working the other way: in each cluster I
> have one slurmdbd daemon that c
Hello,
Ole Holm Nielsen writes:
> Some people have found my Slurm Wiki page helpful:
> https://urldefense.com/v3/__https://wiki.fysik.dtu.dk/Niflheim_system/SLURM/__;!!D9dNQwwGXtA!XMmnNXjYeab2rG3idS5c4OZZWOH-xBHl13dhN9GL954dY5t_semYQVyc07oGLuO7iq3gfU-zuirJ59nt9GIGA7TmbnZfVPtBJw$
me being one of
Hello Ole,
Ole Holm Nielsen writes:
> As Brian wrote:
>
>> On a technical note: slurm keeps the detailed accounting data for each
>> cluster
>> in separate TABLES within a single database.
>
> In the Federation page
> https://urldefense.com/v3/__https://slurm.schedmd.com/federation.html__;!!D9
Hello,
Ole Holm Nielsen writes:
> If I read Brian's comments correctly, he's saying that Slurm already has a
> well-tested and documented solution for multi-cluster sites: Federated
> clusters.
Thanks Ole. Don't get me wrong, I have nothing against using Federated
clusters, and I guess I will
Hello,
This is the first time that I'm installing Slurm, so things are not very
clear to me yet (even more so for multi-cluster operation).
Brian Andrus writes:
> You can do it however you like. You asked if there was a good or existing way
> to
> do it easily, that was provided. Up to you if
Hello,
Brian Andrus writes:
> Ole is spot on with his federated suggestion. That is exactly what fits the
> bill
> for you, given your requirements. You can have everything you want, but you
> don't get to have it how you want (separate databases).
> When/If you looked deeper into it, you will
Hi Ole,
Ole Holm Nielsen writes:
> Maybe you want to use Slurm federated clusters with a single database
thanks for the links, but federated clusters is not what I need. I want
to have separate clusters, with different users, job IDs, etc. and the
only think that I want to aggregate is their da
Hello,
I'm setting Slurm in a number of machines and (at least for the moment)
we don't plan to let users submit across machines, so the initial plan
was to install Slurm+slurmdbd+mysql in every machine.
But in order to get stats for all the machines and to simplify things a
bit, I'm planning now
Hello,
mohammed shambakey writes:
> I appreciate your help. Actually, it is built from the source repo
> (and I'm using Ubuntu 22.04). It is solved another way: after the
> regular building using configure, make, make install, I changed the
> directory to the sview folder (/src/sview), then ran
Hello,
Angel de Vicente writes:
> Do you know how I could fix this while keeping the cgroup plugin? My
> intuition tells me that I should probably get the latest version of
> Slurm and compile it myself, but I thought I would ask here before going
> that route.
I followed my intui
Hello,
Michael Gutteridge writes:
> Does this link help?
>
>> Debian and derivatives (e.g. Ubuntu) usually exclude the memory and
>> memsw (swap) cgroups by default. To include them, add the following
>> parameters to the kernel command line: cgroup_enable=memory swapaccount=1
In the old mac
Hello,
Hermann Schwärzler writes:
> which version of cgroups does Ubuntu 22.04 use?
I'm a cgroups noob, but my understanding is that both v2 and v1 coexist
in Ubuntu 22.04
(https://manpages.ubuntu.com/manpages/jammy/man7/cgroups.7.html). I have
another machine with Ubuntu 18.04, which also has
Hello,
I've installed Slurm in a workstation (this is a single-node install)
with Ubuntu 22.04, and have installed Slurm version 21.08.5 (I didn't
compile it myself, just installed it with "apt install").
In the slurm.conf file I have:
,
| ProctrackType=proctrack/cgroup
| TaskPlugin=task/aff
32 matches
Mail list logo