from:"Andy Georges"

Re: [slurm-users] 21.08: Removing batch scripts from the database

2021-10-05 Thread Andy Georges

Hi, On 05/10/2021 08:45, Kevin Buckley wrote: Trying to get my head around the extremely useful addition, for 21.08 onwards, as regards storing the batch scripts in the accounting database, You are aware of two existing solutions to this that do not involve the slurm accounting DB? https:/

Re: [slurm-users] update node config while jobs are running

2020-03-10 Thread Andy Georges

Hi, On Tue, Mar 10, 2020 at 05:49:07AM +, Rundall, Jacob D wrote: > I need to update the configuration for the nodes in a cluster and I’d like to > let jobs keep running while I do so. Specifically I need to add > RealMemory= to the node definitions (NodeName=). Is it safe to do this > for

Re: [slurm-users] RHEL8 support

2019-10-30 Thread Andy Georges

Hi Brian, On Mon, Oct 28, 2019 at 10:42:59AM -0700, Brian Andrus wrote: > Ok, I had been planning on getting around to it, so this prompted me to do > so. > > Yes, I can get slurm 19.05.3 to build (and package) under CentOS 8. > > There are some caveats, however since many repositories and package

Re: [slurm-users] Running mix versions of slurm while upgrading

2019-10-20 Thread Andy Georges

Hi Tony, On Mon, Oct 21, 2019 at 01:52:21AM +, Tony Racho wrote: > Hi: > > We are planning to upgrade our slurm cluster however we plan on NOT doing it > in a one-go. > > We are on 18.08.7 at the moment (db, controller, clients) > > We'd like to do it in a phased approach. > > Stop communicat

Re: [slurm-users] SLURM in Virtual Machine

2019-09-13 Thread Andy Georges

Hi Jose, On Thu, Sep 12, 2019 at 04:23:11PM +0200, Jose A wrote: > Dear all, > > In the expansion of our Cluster we are considering to install SLURM within a > virtual machine in order to simplify updates and reconfigurations. > > Does any of your have experience running SLURM in VMs? I would rea

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-17 Thread Andy Georges

Hi Mark, Chris, On Mon, Jul 15, 2019 at 01:23:20PM -0400, Mark Hahn wrote: > > Could it be a RHEL7 specific issue? > > no - centos7 systems here, and pam_adopt works. Can you show what your /etc/pam.d/sshd looks like? Kind regards, -- Andy signature.asc Description: PGP signature

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Andy Georges

Hi Juergen, On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote: > Dear all, > > I have configured pam_slurm_adopt in our Slurm test environment by > following the corresponding documentation: > > https://slurm.schedmd.com/pam_slurm_adopt.html > > I've set `PrologFlags=contain´ in slurm.

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

2019-07-10 Thread Andy Georges

Hi, > So here's something funny. One user submitted a job that requested 60 cpu's > and 40M of memory. Our largest nodes in that partition have 72 cpu's and > 256G of memory. So when a user requests 400G of ram, what would be good > behavior? I would like to see slurm reject the job, "job i

Re: [slurm-users] Requirement to run longer jobs

2019-07-03 Thread Andy Georges

Hi, On Wed, Jul 03, 2019 at 03:49:44PM +, David Baker wrote: > Hello, > > > A few of our users have asked about running longer jobs on our cluster. > Currently our main/default compute partition has a time limit of 2.5 days. > Potentially, a handful of users need jobs to run up to 5 hours. R

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-25 Thread Andy Georges

Hi, > On 22 Aug 2018, at 16:27, Christian Peter > wrote: > > hi, > > we observed a strange behavior of pam_slurm_adopt regarding the involved > cgroups: > > when we start a shell as a new Slurm job using "srun", the process has > freezer, cpuset and memory cgroups setup as e.g. > "/slurm/u

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

2018-10-22 Thread Andy Georges

Hi Chris, > On 24 Aug 2018, at 10:57, Christian Peter > wrote: > > hi, > > thank you patrick, thank you kilian for identifying a systemd issue here! > > for a quick test, we disabled and masked systemd-logind. the "memory" cgroup > now works as expected. great! > > we're now watching out fo

[slurm-users] Job walltime

2018-10-17 Thread Andy Georges

Hello, We are migrating away from a Torque/Moab setup. For user convenience, we’re trying to make the differences minimal. I am wondering is there is a way to set the job walltime in the job environment (to set $PBS_WALLTIME). It’s unclear to me how this information can be retrieved on the wo

Re: [slurm-users] Create users

2018-10-05 Thread Andy Georges

Hi, > On 3 Oct 2018, at 16:51, Andy Georges wrote: > > Hi all, > >> On 15 Sep 2018, at 14:47, Chris Samuel wrote: >> >> On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote: >> >>> Another way would be to make all your Linux users and

Re: [slurm-users] Create users

2018-10-03 Thread Andy Georges

Hi all, > On 15 Sep 2018, at 14:47, Chris Samuel wrote: > > On Thursday, 13 September 2018 3:10:19 AM AEST Paul Edmon wrote: > >> Another way would be to make all your Linux users and then map that in to >> Slurm using sacctmgr. > > At ${JOB} and ${JOB-1} we've wired user creation in Slurm int

[slurm-users] sacct does not show anything when PrivateData=jobs is set in slurmdbd.conf

2018-05-18 Thread Andy Georges

Hi, As per the guidelines on the slurmdbd.conf and sachet manual pages, I have set PrivateData=jobs (amongst others) in slurmdbd.conf. However, at this point no job information is available anymore when running sacct, it just does not provide any job related output: vsc40075@gligar03 (SLURM_

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

2018-04-30 Thread Andy Georges

> On 30 Apr 2018, at 22:37, Nate Coraor wrote: > > Hi Shawn, > > I'm wondering if you're still seeing this. I've recently enabled task/cgroup > on 17.11.5 running on CentOS 7 and just discovered that jobs are escaping > their cgroups. For me this is resulting in a lot of jobs ending in > OU

[slurm-users] slurm memory cgroup seems to have vanished

2018-04-04 Thread Andy Georges

Hi, For some reason I am seeing memory cgroups disappear on the nodes: [root@node3108 memory]# file $PWD/slurm /sys/fs/cgroup/memory/slurm: cannot open (No such file or directory) There is however a job running and other cgroups are still present: [root@node3108 memory]# ls /sys/fs/cgroup/cp

[slurm-users] stdout (and stderr) job files

2018-03-15 Thread Andy Georges

Hello, We are transitioning from Moab/Torque to Slurm. I was wondering if there is a way to have Slurm also create the stdout (and stderr) file for the job on the node (be default), rather than on the shared FS. We sometimes have users who write a lot of stuff to stdout from their job script

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges

Hi, > On 9 Mar 2018, at 21:58, Nicholas McCollum wrote: > > Connection refused makes me think a firewall issue. > > Assuming this is a test environment, could you try on the compute node: > > # iptables-save > iptables.bak > # iptables -F && iptables -X > > Then test to see if it works. To

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges

Hi all, Cranked up the debug level a bit Job was not started when using: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun --pty bash -i salloc: Granted job allocation 42 salloc: Waiting for resource configuration salloc: Nodes node2801 are ready for job For comparison purposes, runn

Re: [slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges

cessary. > > The most simple command that I typically use is: > > srun -N1 -n1 --pty bash -i > > Mike > >> On 3/9/18 10:20 AM, Andy Georges wrote: >> Hi, >> >> >> I am trying to get interactive jobs to work from the machine we use as a >>

[slurm-users] Problem launching interactive jobs using srun

2018-03-09 Thread Andy Georges

Hi, I am trying to get interactive jobs to work from the machine we use as a login node, i.e., where the users of the cluster log into and from where they typically submit jobs. I submit the job as follows: vsc40075@test2802 (banette) ~> /bin/salloc -N1 -n1 /bin/srun bash -i salloc: Granted

Re: [slurm-users] 21.08: Removing batch scripts from the database

Re: [slurm-users] update node config while jobs are running

Re: [slurm-users] RHEL8 support

Re: [slurm-users] Running mix versions of slurm while upgrading

Re: [slurm-users] SLURM in Virtual Machine

Re: [slurm-users] pam_slurm_adopt and memory constraints?

Re: [slurm-users] pam_slurm_adopt and memory constraints?

Re: [slurm-users] Jobs waiting while plenty of cpu and memory available

Re: [slurm-users] Requirement to run longer jobs

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

Re: [slurm-users] pam_slurm_adopt does not constrain memory?

[slurm-users] Job walltime

Re: [slurm-users] Create users

Re: [slurm-users] Create users

[slurm-users] sacct does not show anything when PrivateData=jobs is set in slurmdbd.conf

Re: [slurm-users] Jobs escaping cgroup device controls after some amount of time.

[slurm-users] slurm memory cgroup seems to have vanished

[slurm-users] stdout (and stderr) job files

Re: [slurm-users] Problem launching interactive jobs using srun

Re: [slurm-users] Problem launching interactive jobs using srun

Re: [slurm-users] Problem launching interactive jobs using srun

[slurm-users] Problem launching interactive jobs using srun

22 matches

Site Navigation

Mail list logo

Footer information