[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

2024-09-05 Thread Angel de Vicente via slurm-users
Hello again, Angel de Vicente via slurm-users writes: > [...] I don't understand is why the first three submissions > below do get stopped by sbatch while the last one happily goes through? > >>> , >>> | $ sbatch -N 1 -n 1 -c 76 -p short --mem-per-cpu=4000

[slurm-users] Re: Bug? sbatch not respecting MaxMemPerNode setting

2024-09-05 Thread Angel de Vicente via slurm-users
Hello, Brian Andrus via slurm-users writes: > Unless you are using cgroups and constraints, there is no limit > imposed. [...] > So your request did not exceed what slurm sees as available (1 cpu > using 4GB), so it is happy to let your script run. I suspect if you > look at the usage, you wil

[slurm-users] Bug? sbatch not respecting MaxMemPerNode setting

2024-09-04 Thread Angel de Vicente via slurm-users
Hello, we found an issue with Slurm 24.05.1 and the MaxMemPerNode setting. Slurm is installed in a single workstation, and thus, the number of nodes is just 1. The relevant sections in slurm.conf read: , | EnforcePartLimits=ALL | PartitionName=short Nodes=. State=UP Default=YES Max

Re: [slurm-users] Site factor plugin example?

2023-10-13 Thread Angel de Vicente
Hello Loris, "Loris Bennett" writes: > Did you ever find an example or write your own plugin which you could > provide as a example? I'm afraid not (though I didn't persevere, because for the moment we are trying to encourage our users not to waste resources with a different approach). But, in

Re: [slurm-users] Tracking efficiency of all jobs on the cluster (dashboard etc.)

2023-09-07 Thread Angel de Vicente
Hi Will, Will Furnell - STFC UKRI writes: > That does sound like an interesting solution – yes please would you be > able to send me (or us if you’re willing to share it to the list) > through some more information please? > > And thank you everyone else that has replied to my email – there’s >

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-09-07 Thread Angel de Vicente
Hello Cristobal, Cristóbal Navarro writes: > Hello Angel and Community, > I am facing a similar problem with a DGX A100 with DGX OS 6 (Based on > Ubuntu 22.04 LTS) and Slurm 23.02. > When I execute `slurmd` service, it status shows failed with the > following information below. > As of today, w

Re: [slurm-users] MaxMemPerCPU not enforced?

2023-07-25 Thread Angel de Vicente
Hello, Angel de Vicente writes: > From my limited tests today, somehow in the interactive queue all seems > OK now, but not so in the 'batch' queue. For example, I just submitted > three jobs with different amount of CPUs per job (4, 8 and 16 processes > respectively).

Re: [slurm-users] MaxMemPerCPU not enforced?

2023-07-24 Thread Angel de Vicente
Hello, Matthew Brown writes: > Minimum  memory required per allocated CPU. ... Note that if the job's > --mem-per-cpu value exceeds the configured MaxMemPerCPU, then  the > user's  limit  will be treated as a memory limit per task Ah, thanks, I should've read the documentation more carefully.

[slurm-users] MaxMemPerCPU not enforced?

2023-07-24 Thread Angel de Vicente
Hello, I'm trying to get Slurm to control the memory used per CPU, but it does not seem to enforce the MaxMemPerCPU option in slurm.conf This is running in Ubuntu 22.04 (cgroups v2), Slurm 23.02.3. Relevant configuration options: ,cgroup.conf | AllowedRAMSpace=100 | ConstrainCores=yes | Con

Re: [slurm-users] Notify users about job submit plugin actions

2023-07-19 Thread Angel de Vicente
Hello Lorenzo, Lorenzo Bosio writes: > I'm developing a job submit plugin to check if some conditions are met before > a job runs. > I'd need a way to notify the user about the plugin actions (i.e. why its jobs > was killed and what to do), but after a lot of research I could only write to >

[slurm-users] Site factor plugin example?

2023-07-12 Thread Angel de Vicente
Hello, I want to experiment with writing our own site factor plugin. In the documentation I found the API details (https://slurm.schedmd.com/site_factor.html), but it would be much easier for me if I had some example site factor plugin to start with. Do you know of any examples that can set me in

[slurm-users] sstat -a: Socket timed out on send/recv operation

2023-07-11 Thread Angel de Vicente
Hello, trying to get some stats about a running job, I've realized that one of the jobs is consistently failing with: , | sstat: error: slurm_receive_msgs: [[]:6818] failed: Socket timed out on send/recv operation | sstat: error: slurm_job_step_stat: unknown return given from .ll.ia

Re: [slurm-users] END Mail notifications not being sent?

2023-07-03 Thread Angel de Vicente
Hello, Angel de Vicente writes: > Any idea what could be going on or how to debug this? As a follow-up, I found that this was due to the "smail" script (bundled with the "seff" contributed package). I had to do a small modification and mails are being now delivered norma

[slurm-users] END Mail notifications not being sent?

2023-07-03 Thread Angel de Vicente
Hello, recently I updated our Slurm version to 23.02.3 and I have now noticed that jobs having the "mail-type" option as: #SBATCH --mail-type=BEGIN,END only send mail notification for the BEGIN step. This was previously working for both BEGIN and END notifications (I believe it was OK with versi

Re: [slurm-users] seff in slurm-23.02

2023-05-25 Thread Angel de Vicente
Hello, David Gauchard writes: > slurm-23.02 on ubuntu-20.04, > > seff is not working anymore: perhaps it is something specific to 20.04? I'm on Ubuntu 22.04 and slurm-23.02.1 here and no problems with seff, except that the memory efficiency part seems broken (I always seem to get 0.00% efficien

Re: [slurm-users] Limit run time of interactive jobs

2023-05-08 Thread Angel de Vicente
Hello, Bjørn-Helge Mevik writes: >> A solution was suggested in >> https://serverfault.com/questions/1090689/how-can-i-set-up-interactive-job-only-or-batch-job-only-partition-on-a-slurm-clu >>> Interactive jobs have no script and job_desc.script will be empty / >> not set. >> >> So maybe somethi

Re: [slurm-users] Limit run time of interactive jobs

2023-05-08 Thread Angel de Vicente
Hi, Bjørn-Helge Mevik writes: > Wouldn't it be simpler to just refuse too long interactive jobs in > job_submit.lua? Yes, I guess. I proposed the idea of having different partitions because then the constraints are at the level of the partition, which is probably easier to handle than modifying

Re: [slurm-users] Limit run time of interactive jobs

2023-05-06 Thread Angel de Vicente
Hi Marko, Marko Markoc writes: > Quick question. Is there a way to limit the runtime on a partition > only for salloc ? I would like for batch jobs to have a default max > runtime of the partition but interactive jobs to have shortened > allowed runtime. I'm also interested in this (in my case

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-05-03 Thread Angel de Vicente
Hello, Angel de Vicente writes: > , > | slurmd: fatal: Hybrid mode is not supported. Mounted cgroups are: > | 5:freezer:/ > | 3:cpuacct:/ > ` in the end I learnt that despite Ubuntu 22.04 reporting to be using only cgroup V2, it was also using V1 and creating those mo

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-05-03 Thread Angel de Vicente
Hello, Angel de Vicente writes: > And hence my question.. because as I was saying in a previous mail, > reading the documentation I understand that this is the standard way to > do it, but right now I got it working the other way: in each cluster I > have one slurmdbd daemon that c

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-05-01 Thread Angel de Vicente
Hello, Ole Holm Nielsen writes: > Some people have found my Slurm Wiki page helpful: > https://urldefense.com/v3/__https://wiki.fysik.dtu.dk/Niflheim_system/SLURM/__;!!D9dNQwwGXtA!XMmnNXjYeab2rG3idS5c4OZZWOH-xBHl13dhN9GL954dY5t_semYQVyc07oGLuO7iq3gfU-zuirJ59nt9GIGA7TmbnZfVPtBJw$ me being one of

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-05-01 Thread Angel de Vicente
Hello Ole, Ole Holm Nielsen writes: > As Brian wrote: > >> On a technical note: slurm keeps the detailed accounting data for each >> cluster >> in separate TABLES within a single database. > > In the Federation page > https://urldefense.com/v3/__https://slurm.schedmd.com/federation.html__;!!D9

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-05-01 Thread Angel de Vicente
Hello, Ole Holm Nielsen writes: > If I read Brian's comments correctly, he's saying that Slurm already has a > well-tested and documented solution for multi-cluster sites: Federated > clusters. Thanks Ole. Don't get me wrong, I have nothing against using Federated clusters, and I guess I will

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-05-01 Thread Angel de Vicente
Hello, This is the first time that I'm installing Slurm, so things are not very clear to me yet (even more so for multi-cluster operation). Brian Andrus writes: > You can do it however you like. You asked if there was a good or existing way > to > do it easily, that was provided. Up to you if

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-04-30 Thread Angel de Vicente
Hello, Brian Andrus writes: > Ole is spot on with his federated suggestion. That is exactly what fits the > bill > for you, given your requirements. You can have everything you want, but you > don't get to have it how you want (separate databases). > When/If you looked deeper into it, you will

Re: [slurm-users] Several slurmdbds against one mysql server?

2023-04-29 Thread Angel de Vicente
Hi Ole, Ole Holm Nielsen writes: > Maybe you want to use Slurm federated clusters with a single database thanks for the links, but federated clusters is not what I need. I want to have separate clusters, with different users, job IDs, etc. and the only think that I want to aggregate is their da

[slurm-users] Several slurmdbds against one mysql server?

2023-04-29 Thread Angel de Vicente
Hello, I'm setting Slurm in a number of machines and (at least for the moment) we don't plan to let users submit across machines, so the initial plan was to install Slurm+slurmdbd+mysql in every machine. But in order to get stats for all the machines and to simplify things a bit, I'm planning now

Re: [slurm-users] sview not installed

2023-04-23 Thread Angel de Vicente
Hello, mohammed shambakey writes: > I appreciate your help. Actually, it is built from the source repo > (and I'm using Ubuntu 22.04). It is solved another way: after the > regular building using configure, make, make install, I changed the > directory to the sview folder (/src/sview), then ran

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-04-22 Thread Angel de Vicente
Hello, Angel de Vicente writes: > Do you know how I could fix this while keeping the cgroup plugin? My > intuition tells me that I should probably get the latest version of > Slurm and compile it myself, but I thought I would ask here before going > that route. I followed my intui

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-04-21 Thread Angel de Vicente
Hello, Michael Gutteridge writes: > Does this link help?  > >> Debian and derivatives (e.g. Ubuntu) usually exclude the memory and  >> memsw (swap) cgroups by default. To include them, add the following  >> parameters to the kernel command line: cgroup_enable=memory swapaccount=1 In the old mac

Re: [slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-04-21 Thread Angel de Vicente
Hello, Hermann Schwärzler writes: > which version of cgroups does Ubuntu 22.04 use? I'm a cgroups noob, but my understanding is that both v2 and v1 coexist in Ubuntu 22.04 (https://manpages.ubuntu.com/manpages/jammy/man7/cgroups.7.html). I have another machine with Ubuntu 18.04, which also has

[slurm-users] Problem with cgroup plugin in Ubuntu22.04 and slurm 21.08.5

2023-04-21 Thread Angel de Vicente
Hello, I've installed Slurm in a workstation (this is a single-node install) with Ubuntu 22.04, and have installed Slurm version 21.08.5 (I didn't compile it myself, just installed it with "apt install"). In the slurm.conf file I have: , | ProctrackType=proctrack/cgroup | TaskPlugin=task/aff