Re: [slurm-users] Multi-node job failure

2019-12-12 Thread Chris Samuel
On 11/12/19 8:05 am, Chris Woelkers - NOAA Federal wrote: Partial progress. The scientist that developed the model took a look at the output and found that instead of one model run being ran in parallel srun had ran multiple instances of the model, one per thread, which for this test was 110 t

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-12 Thread Chris Samuel
On 12/12/19 7:38 am, Ryan Cox wrote: Be careful with this approach.  You also need the same munge key installed everywhere.  If the developers have root on their own system, they can submit jobs and run Slurm commands as any user. I would echo Ryan's caution on this and add that as root they

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Chris Samuel
On 12/12/19 8:14 am, Dean Schulze wrote: configure:5021: gcc -o conftest -I/usr/include/mysql -g -O2   conftest.c -L/usr/lib/x86_64-linux-gnu -lmysqlclient -lpthread -lz -lm -lrt -latomic -lssl -lcrypto -ldl  >&5 /usr/bin/ld: cannot find -lssl /usr/bin/ld: cannot find -lcrypto collect2: error:

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Chris Samuel
Hi Chris, On 12/12/19 3:16 pm, Christopher Benjamin Coffey wrote: What am I missing? It's just a setting on the QOS, not the user: csamuel@cori01:~> sacctmgr show qos where name=regular_1 format=MaxJobsAccruePerUser MaxJobsAccruePU --- 2 So any user in that QOS c

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Christopher Benjamin Coffey
Hmm, after trying this out I'm confused. I don't see the limit placed on the qos. Infact, I see that the qos header is missing some other options that are available in the man page. Maybe I'm missing an option that enables some of the options. [ddd@siris /home/ddd]$ sacctmgr update qos name=bil

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Christopher Benjamin Coffey
Ahh hah! Thanks Killian! Best, Chris -- Christopher Coffey High-Performance Computing Northern Arizona University 928-523-1167 On 12/12/19, 3:03 PM, "slurm-users on behalf of Kilian Cavalotti" wrote: Hi Chris, On Thu, Dec 12, 2019 at 10:47 AM Christopher Benjamin Coffey

Re: [slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Kilian Cavalotti
Hi Chris, On Thu, Dec 12, 2019 at 10:47 AM Christopher Benjamin Coffey wrote: > I believe I heard recently that you could limit the number of users jobs that > accrue age priority points. Yet, I cannot find this option in the man pages. > Anyone have an idea? Thank you! It's the *JobsAccrue*

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Dean Schulze
Thanks for mentioning the config.log file. It has dozens of errors in it, yet ./configure completes and doesn't report any errors. Here's what got me past the problem with the mysql plugin. A test program that needed -lssl and -lcrypto on the make command line was failing. The solution was sud

[slurm-users] Maxjobs to accrue age priority points

2019-12-12 Thread Christopher Benjamin Coffey
Hi, I believe I heard recently that you could limit the number of users jobs that accrue age priority points. Yet, I cannot find this option in the man pages. Anyone have an idea? Thank you! Best, Chris -- Christopher Coffey High-Performance Computing Northern Arizona University 928-523-116

Re: [slurm-users] sched

2019-12-12 Thread Alex Chekholko
Hey Steve, I think it doesn't just "power down" the nodes but deletes the instances. So then when you need a new node, it creates one, then provisions the config, then updates the slurm cluster config... That's how I understand it, but I haven't tried running it myself. Regards, Alex On Thu, De

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Dean Schulze
There's a mysql test failure in config.log. It looks like a couple of missing libraries. The config.log also shows errors because g++ isn't present, and dozens of errors because of failed includes. I must need g++ packages on my Ubuntu instance. But ./configure completes successfully in spite o

[slurm-users] pkgconfig conflict

2019-12-12 Thread William Brown
Version 19.05.3-2 CentOS 7.7 I was wanting to install the slurm-devel RPM that I had built, but I get this translation check error: $ sudo yum localinstall /home/apps/slurm/19.05/RPMS/slurm-devel-19.05.3-2.el7.x86_64.rpm . . Transaction check error: file /usr/lib64/pkgconfig from install of slu

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-12 Thread Ryan Cox
Be careful with this approach.  You also need the same munge key installed everywhere.  If the developers have root on their own system, they can submit jobs and run Slurm commands as any user. ssh sounds significantly safer.  A quick and easy way to make sure that users don't abuse the system

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-12 Thread Marcus Wagner
Hi Beatrice and Bjørn-Helge, I can sign, that it works with 18.08.7. We additionally use TRESBillingWeights together with PriorityFlags=MAX_TRES. For example: TRESBillingWeights="CPU=1.0,Mem=0.1875G,gres/gpu=12.0" We use the billing factor for our external accounting. We do this to do a fair a

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread Gennaro Oliva
Hi Dean, On Wed, Dec 11, 2019 at 04:04:44PM -0700, Dean Schulze wrote: > I tried again with a completely new system (virtual machine). I used the > latest source, I used mysql instead of mariadb, and I installed all the > client and dev libs (below). I still get the same error. It doesn't > bui

[slurm-users] slurm cpu allocation

2019-12-12 Thread Ricardo Gregorio
Hi all, Wondering whether someone could help me with the following: I am new and struggling a bit with some slurm concepts. We are running an old version: 17.02.11 soon upgrading to 18.X On our slurm.conf "SelectType=select/cons_res" https://slurm.schedmd.com/cpu_management.html#Step1 21x compu

[slurm-users] sched

2019-12-12 Thread Steve Brasier
Hi, I'm hoping someone can shed some light on the SchedMD-provided example here https://github.com/SchedMD/slurm-gcp for an autoscaling cluster on Google Cloud Plaform (GCP). I understand that slurm autoscaling uses the power saving interface to create/remove nodes and the example suspend.py and r

Re: [slurm-users] Need help with controller issues

2019-12-12 Thread William Brown
I looked back in the list to November when I had the same problem problem building with MariaDB: On 11-11-2019 21:23, William Brown wrote: > I have in fact found the answer by looking harder. > > The config.log clearly showed that the build of the test MySQL > program failed, w

Re: [slurm-users] Is that possible to submit jobs to a Slurm cluster right from a developer's PC

2019-12-12 Thread Nguyen Dai Quy
On Thu, Dec 12, 2019 at 5:53 AM Ryan Novosielski wrote: > Sure; they’ll need to have the appropriate part of SLURM installed and the > config file. This is similar to having just one login node per user. > Typically login nodes don’t run either daemon. > > Hi, It's interesting ! Do you have any l

Re: [slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

2019-12-12 Thread Bjørn-Helge Mevik
Beatrice Charton writes: > Hi, > > We have a strange behaviour of Slurm after updating from 18.08.7 to > 18.08.8, for jobs using --exclusive and --mem-per-cpu. > > Our nodes have 128GB of memory, 28 cores. > $ srun --mem-per-cpu=3 -n 1 --exclusive hostname > => works in 18.08.7 > =>

Re: [slurm-users] cleanup script after timeout

2019-12-12 Thread Reuti
Hi, Am 12.12.2019 um 03:06 schrieb Brian Andrus: > You prompted me to dig even deeper into my epilog. I was trying to access a > semaphore file in the user's home directory. > > It seems that when the epilogue is run the ~ is not expanded in anyway. So I > can't even use ~${SLURM_JOB_USER} to