Re: [slurm-users] [External] Re: PropagateResourceLimits

2021-04-27 Thread Diego Zuccato
Il 27/04/2021 17:31, Prentice Bisbal ha scritto: I don't think PAM comes into play here. Since Slurm is starting the processes on the compute nodes as the user, etc., PAM is being bypassed. Then maybe slurmd somehow goes throught the PAM stack another way, since limits on the frontend got propa

Re: [slurm-users] Extract job information after completion

2021-04-27 Thread O'Grady, Paul Christopher
On Apr 27, 2021, at 10:44 AM, slurm-users-requ...@lists.schedmd.com wrote: In slurm.conf set: EpilogSlurmctld=/etc/slurm/slurm.epilogslurmctld Which does a number of things, including the following: root@pople01:/etc/slurm # tail -6 slurm.epilog

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread Sid Young
Hi David, I use SaltStack to push out the slurm.conf file to all nodes and do a "scontrol reconfigure" of the slurmd, this makes management much easier across the cluster. You can also do service restarts from one point etc. Avoid NFS mounts for the config, if the mount locks up your screwed. htt

[slurm-users] Slurm version 20.11.6 is now available

2021-04-27 Thread Tim Wickberg
We are pleased to announce the availability of Slurm version 20.11.6. This includes a number of minor-to-moderate severity fixes, as well as improvements to the recently added job_container/tmpfs plugin. Slurm can be downloaded from https://www.schedmd.com/downloads.php . - Tim -- Tim Wickbe

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread Max Voit
On Tue, 27 Apr 2021 11:35:18 -0700 David Henkemeyer wrote: > - Can I create a symlink that points /slurm.conf to a > slurm.conf file on an NFS mount point, which is mounted on all the > nodes? This way, I would only need to update a single file, then > restart Slurm across the entire cluster. Yo

Re: [slurm-users] [External] Re: What is an easy way to prevent users run programs on the master/login node.

2021-04-27 Thread Prentice Bisbal
But won't that first process be able to use 100% of a core? What if enough users do this such that every core is at 100% utilization? Or, what if the application is MPI + OpenMP? In that case, that one process on the login node could spawn multiple threads that use the remaining cores on the lo

Re: [slurm-users] [External] Re: What is an easy way to prevent users run programs on the master/login node.

2021-04-27 Thread Prentice Bisbal
This is not a good approach. There's plenty of jobs you can run that will hog a systems resources without using MPI. MATLAB and Mathematica both support parallel computation, and don't need to use MPI to do so. Then there's OpenMP and other threaded applications that don't need mpirun/mpiexec t

Re: [slurm-users] [External] Re: What is an easy way to prevent users run programs on the master/login node.

2021-04-27 Thread Prentice Bisbal
Using limits.conf is not a very good approach. Limits in /etc/security/limits.conf apply to each individual shell, so an individual user can still abuse a login node by running tasks in multiple shells. Cgroups, which is implemented in the kernel and takes a system-wide view or resource usage i

Re: [slurm-users] [External] What is an easy way to prevent users run programs on the master/login node.

2021-04-27 Thread Prentice Bisbal
I think someone asked this same exact question a few weeks ago. The best solution I know of is to use Arbiter, which was created exactly for this situation. It uses cgroups to limit resource usage, but it adjusts those limits based on login node utilization and each users behavior ("bad" users

Re: [slurm-users] [External] Re: PropagateResourceLimits

2021-04-27 Thread Prentice Bisbal
I don't think PAM comes into play here. Since Slurm is starting the processes on the compute nodes as the user, etc., PAM is being bypassed. Prentice On 4/22/21 10:55 AM, Ryan Novosielski wrote: My recollection is that this parameter is talking about “ulimit” parameters, and doesn’t have to

Re: [slurm-users] [External] Re: safe to delete old QOSes?

2021-04-27 Thread Prentice Bisbal
I've just done that on one of my test systems - and it's not deleting a no longer used QoS, but 'renaming' the most used one. So plenty of test jobs that used it in the database :) Can you elaborate on what you mean by "renaming"? -- Prentice On 4/19/21 8:55 AM, Tina Friedrich wrote: Hi P

[slurm-users] Extract job information after completion

2021-04-27 Thread O'Grady, Paul Christopher
Sometimes when a slurm job fails I want to see what a user did, getting the command/workdir/stdout/stderr information. I can see that with "scontrol show job ". However, after the job is done that command doesn't seem to work anymore, saying "invalid job id". I try to use sacct, which seems t

Re: [slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread Paul Edmon
1. Part of the communications for slurm is hierarchical.  Thus nodes need to know about other nodes so they can talk to each other and forward messages to the slurmctld. 2. Yes, this is what we do.  We have our slurm.conf shared via NFS from our slurm master and then we just update that single

[slurm-users] Questions about adding new nodes to Slurm

2021-04-27 Thread David Henkemeyer
Hello, I'm new to Slurm (coming from PBS), and so I will likely have a few questions over the next several weeks, as I work to transition my infrastructure from PBS to Slurm. My first question has to do with *adding nodes to Slurm*. According to the FAQ (and other articles I've read), you need t

Re: [slurm-users] What is an easy way to prevent users run programs on the master/login node.

2021-04-27 Thread Cristóbal Navarro
Many thanks to all, I will try to set cgroups properly best On Mon, Apr 26, 2021 at 2:04 AM Marcus Wagner wrote: > Hi, > > we also have a wrapper script, together with a number of "MPI-Backends". > If mpiexec is called on the login nodes, only the first process is started > on the login node, th

Re: [slurm-users] Extract job information after completion

2021-04-27 Thread Sean McGrath
Hi, On Tue, Apr 27, 2021 at 03:14:04PM +, O'Grady, Paul Christopher wrote: > Sometimes when a slurm job fails I want to see what a user did, getting the > command/workdir/stdout/stderr information. I can see that with "scontrol > show job ". However, after the job is done that command doe

[slurm-users] QOS or Priority Access to GPU/GRES?

2021-04-27 Thread Jason Simms
Hello all, As usual, I have a super basic question, so thank you for your patience. I want to verify the correct syntax to configure a GPU for priority preempt access via a QOS, much like we are currently doing for a specified number of cores. When I have created a QOS in the past, I've so far onl

[slurm-users] Alias for SlurmctldHost

2021-04-27 Thread Rupert Madden-Abbott
Hi, Is it possible to set SlurmctldHost to something other than the hostname? For example, can I configure /etc/hosts or DNS in some way to allow me to use an alias? I have tried both but on starting slurmctld I get this: slurmctld: error: This host (hostname/hostname) not a valid controller Wh

Re: [slurm-users] Slurm reservation for migrating user home directories

2021-04-27 Thread Ole Holm Nielsen
On 4/16/21 4:21 PM, Ole Holm Nielsen wrote: I'm thinking of a reservation something like this: scontrol create reservation starttime=...  duration=12:00:00 ReservationName=migrate_physics nodes=ALL Accounts=-physics For the record: The idea of creating a Slurm reservation for excluding spec