date:20190717

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Andy Georges

Hi Mark, Chris, On Mon, Jul 15, 2019 at 01:23:20PM -0400, Mark Hahn wrote: > > Could it be a RHEL7 specific issue? > > no - centos7 systems here, and pam_adopt works. Can you show what your /etc/pam.d/sshd looks like? Kind regards, -- Andy signature.asc Description: PGP signature

[slurm-users] Correct way in sbatch/srun to switch primary UNIX group.

2019-07-17 Thread Viviano, Brad

Our site has been going through the process of upgrading SLURM on our primary cluster which was delivered to us with Slurm 16.05 with Bright Computing. We're currently at 17.02.13-2 and working to get to 17.11 and then 18.08. We've run into an issue with 17.11 and switching effective GID on a

[slurm-users] Cluster-wide GPU Per User limit

2019-07-17 Thread Mike Harvey

Is it possible to set a cluster level limit of GPUs per user? We'd like to implement a limit of how many GPUs a user may use across multiple partitions at one time. I tried this, but it obviously isn't correct: # sacctmgr modify cluster slurm_cluster set MaxTRESPerUser=gres/gpu=2 Unknown o

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Brian W. Johanson

On 7/17/19 12:26 AM, Chris Samuel wrote: On 16/7/19 11:43 am, Will Dennis wrote: [2019-07-16T09:36:51.464] error: slurmdbd: agent queue is full (20140), discarding DBD_STEP_START:1442 request So it looks like your slurmdbd cannot keep up with the rate of these incoming steps and is having

Re: [slurm-users] Cluster-wide GPU Per User limit

2019-07-17 Thread David Rhey

Unfortunately, I think you're stuck in setting it at the account level with sacctmgr. You could also set that limit as part of a QoS and then attach the QoS to the partition. But I think that's as granular as you can get for limiting TRES'. HTH! David On Wed, Jul 17, 2019 at 10:11 AM Mike Harvey

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Will Dennis

Not thinking that the server (which runs both the Slurm controller daemon and the DB) is the issue... It's a Dell PowerEdge R430 platform, with dual Intel Xeon E5-2640v3 CPUs and 256GB memory, and RAID-1 array of 1TB SATA disks. top - 09:29:26 up 101 days, 14:57, 3 users, load average: 0.06,

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

2019-07-17 Thread Will Dennis

OK, as it turns out, it was a problem like this bug: https://bugs.schedmd.com/show_bug.cgi?id=3819 ( cf https://bugs.schedmd.com/show_bug.cgi?id=2741 as well ) Back in May, I posted the following thread: https://lists.schedmd.com/pipermail/slurm-users/2019-May/003372.html - to which I never go

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Sean Crosby

Hi Andy, We have RHEL7, and pam_slurm_adopt is working for us as well, with memory constraint working pam.d/sshd: #%PAM-1.0 auth required pam_sepermit.so auth substack password-auth auth include postlogin # Used with polkit to reauthorize users in remote sessions -

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Christopher Samuel

On 7/17/19 4:05 AM, Andy Georges wrote: Can you show what your /etc/pam.d/sshd looks like? For us it's actually here: --- # cat /etc/pam.d/common-account #%PAM-1.0 # # This file is autogenerated by pam-config. All changes # will be o

Re: [slurm-users] pam_slurm_adopt and memory constraints?

[slurm-users] Correct way in sbatch/srun to switch primary UNIX group.

[slurm-users] Cluster-wide GPU Per User limit

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

Re: [slurm-users] Cluster-wide GPU Per User limit

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

Re: [slurm-users] sacct issue: jobs staying in "RUNNING" state

Re: [slurm-users] pam_slurm_adopt and memory constraints?

Re: [slurm-users] pam_slurm_adopt and memory constraints?

9 matches

Site Navigation

Mail list logo

Footer information