Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Chris Samuel
On 22/3/19 12:51 pm, Ole Holm Nielsen wrote: The web page explains how the weight mask is defined: Each digit in the mask defines a node property.  Please read the example given. I don't think that's what José is asking for, he wants the weights for a node to be different when being considere

Re: [slurm-users] Database Tuning w/SLURM

2019-03-22 Thread Ryan Novosielski
> On Mar 22, 2019, at 4:22 AM, Ole Holm Nielsen > wrote: > > On 3/21/19 6:56 PM, Ryan Novosielski wrote: >>> On Mar 21, 2019, at 12:21 PM, Loris Bennett >>> wrote: >>> >>> Our last cluster only hit around 2.5 million jobs after >>> around 6 years, so database conversion was never an issue.

Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Ole Holm Nielsen
On 22-03-2019 17:22, Jose A wrote: Dear Ole, Thanks for your fast reply. I really appreciate that. I had a look at your website and googled about “weight masks” but still have some questions. From your example I see that the mask definition is commented out. How to define what the mask mean

Re: [slurm-users] Can one specify attributes on a GRES resource?

2019-03-22 Thread Will Dennis
I am wondering where Slurm is getting the "0" value from that I see in the error... I went spelunking in the 16.05 source for "count too low", and I found the string in ./src/common/gres.c beginning at line 2045: if ((fast_schedule < 2) && (gres_data->gres_cnt_found < gres_

Re: [slurm-users] X11 forwarding and VNC?

2019-03-22 Thread Christopher Benjamin Coffey
Loris, Glad you've made some progress. We finally got it working as well, and have two findings: 1. the login node fqdn must be the same as the compute nodes 2. --x11 is not required to be added to srun and actually causes it to fail for some reason for us. Very odd, anyone have thoughts? - No

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-22 Thread Prentice Bisbal
This is the first place I've had regularly scheduled maintenance, too, and boy does it make life easier. In most of my previous jobs, it was a small enough environment that it wasn't necessary. On 3/22/19 1:57 PM, Christopher Samuel wrote: On 3/22/19 10:31 AM, Prentice Bisbal wrote: Most HP

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Bill Barth
Slurm is almost certainly calling execve() with the path to a copy of this script as an argument eventually, so yes, tcsh will be noticed by the Linux kernel as the first line and invoked to handle the contents. Slurm doesn’t have to honor it since the kernel will. Slurm, usually makes a pass th

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-22 Thread Christopher Samuel
On 3/22/19 10:31 AM, Prentice Bisbal wrote: Most HPC centers have scheduled downtime on a regular basis. That's not my experience before now, where I've worked in Australia we scheduled maintenance for when we absolutely had to do them, but there could be delays to them if there were critica

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Thomas M. Payerle
Does the GUI run as the user (e.g. the user starts the GUI, and so submitting process is owned by user), or is the GUI running as a daemon (in which case, is it submitting jobs as the user and if so how). And is the default shell of the user submitting the job tcsh (like the shebang in the script)

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-22 Thread Prentice Bisbal
Rafael, Most HPC centers have scheduled downtime on a regular basis. Typically it's one day a month, but I I know that at Argonne National Lab, which is a DOE Leadership Computing Facility that house some of the largest supercomputers in the world for use by a large number of scientists, they

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Prentice Bisbal
Chris, I use that -x switch all the time in other situations. Don't know why I didn't think of using it in this one. Thanks for reminding me of that. Prentice On 3/22/19 1:18 PM, Christopher Samuel wrote: On 3/21/19 3:43 PM, Prentice Bisbal wrote: #!/bin/tcsh Old school script debugging

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Prentice Bisbal
Thomas, The GUI app writes the script to the file slurm_script.sh in the cwd.  I did exactly what you suggested as my first step in debugging check the Command= value from the output of 'scontrol show job' to see what script was actually submitted, and it was the slurm_script.sh in the cwd.

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Christopher Samuel
On 3/21/19 3:43 PM, Prentice Bisbal wrote: #!/bin/tcsh Old school script debugging trick - make that line: #!/bin/tcsh -x and then you'll see everything the script is doing. All the best, Chris -- Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Prentice Bisbal
On 3/22/19 12:40 PM, Reuti wrote: Am 22.03.2019 um 16:20 schrieb Prentice Bisbal : On 3/21/19 6:56 PM, Reuti wrote: Am 21.03.2019 um 23:43 schrieb Prentice Bisbal: Slurm-users, My users here have developed a GUI application which serves as a GUI interface to various physics codes they use.

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Thomas M. Payerle
Assuming the GUI produced script is as you indicated (I am not sure where you got the script you showed, but if it is not the actual script used by a job it might be worthwhile to examine the Command= file from scontrol show job to verify), then the only thing that should be different from a GUI su

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Reuti
> Am 22.03.2019 um 16:20 schrieb Prentice Bisbal : > > On 3/21/19 6:56 PM, Reuti wrote: >> Am 21.03.2019 um 23:43 schrieb Prentice Bisbal: >> >>> Slurm-users, >>> >>> My users here have developed a GUI application which serves as a GUI >>> interface to various physics codes they use. From thi

Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Jose A
Dear Ole, Thanks for your fast reply. I really appreciate that. I had a look at your website and googled about “weight masks” but still have some questions. From your example I see that the mask definition is commented out. How to define what the mask means? If helps, I’ll put an easy examp

Re: [slurm-users] Changing node weights in partitions

2019-03-22 Thread Ole Holm Nielsen
On 3/22/19 4:15 PM, José A. wrote: Dear all, I would like to create two partitions, A and B, in which node1 had a certain weight in partition A and a different one in partition B. Does anyone know how to implement it? Some pointers to documentation of this and a practical example is in my W

Re: [slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app

2019-03-22 Thread Prentice Bisbal
On 3/21/19 6:56 PM, Reuti wrote: Am 21.03.2019 um 23:43 schrieb Prentice Bisbal: Slurm-users, My users here have developed a GUI application which serves as a GUI interface to various physics codes they use. From this GUI, they can submit jobs to Slurm. On Tuesday, we upgraded Slurm from 18.

[slurm-users] Changing node weights in partitions

2019-03-22 Thread José A .
Dear all, I would like to create two partitions, A and B, in which node1 had a certain weight in partition A and a different one in partition B. Does anyone know how to implement it? Thanks very much for the help! Cheers, José

[slurm-users] Easybuild et al. (was: SLURM heterogeneous jobs, a little help needed plz)

2019-03-22 Thread Loris Bennett
Hi Daniel, Daniel Letai writes: > Hi Loris, > > On 3/21/19 6:21 PM, Loris Bennett wrote: > > Chris, maybe you should look at EasyBuild > (https://easybuild.readthedocs.io/en/latest/). That way you can install > all the dependencies (such as zlib) as modules and be pretty much > independent of

Re: [slurm-users] X11 forwarding and VNC?

2019-03-22 Thread Loris Bennett
Hi Nathan, Thanks, this indeed works and I've certainly seen hackier workarounds 😅 However, I assume there is also some other way, as the VNC users didn't have the problem on our old system, where we used the external X11 plugin, although Luca points to there being an unresolved bug, so maybe not

Re: [slurm-users] X11 forwarding and VNC?

2019-03-22 Thread Luca Capello
Hi there, On 3/22/19 2:44 PM, Loris Bennett wrote: > Does anyone have any ideas whether this can be made to work and, if so, > how? At the University of Geneva (Switzerland) we experience the same issues, which is already described in the upstream bug tracker, with no solution though:

Re: [slurm-users] X11 forwarding and VNC?

2019-03-22 Thread Nathan Harper
The hacky workaround is to 'ssh -X' back into the node before running the srun, so you appear as if you are remote even if you aren't. eg ssh -X loginnode01 then: ssh -X loginnode01 then: srun --x11=first bash On Fri, 22 Mar 2019 at 13:48, Loris Bennett wrote: > Hi, > > I'm using 18.08.6-2 and

[slurm-users] X11 forwarding and VNC?

2019-03-22 Thread Loris Bennett
Hi, I'm using 18.08.6-2 and have got X11 forwarding working using the in-built mechanism. This works fine for users who log in with 'ssh -X' and then do 'srun --x11 --pty bash'. However, I have users who start a VNC session on the login node and when they run the srun command above from an xterm

Re: [slurm-users] SLURM heterogeneous jobs, a little help needed plz

2019-03-22 Thread Frava
Hi all, I think it's not that easy to keep SLURM up to date in a cluster of more than 3k nodes with a lot of users. I mean, that cluster has only a little more than 2 years old and my today's submission got the JOBID 12711473, the queue has 9769 jobs (squeue | wc -l). In two years there were only

[slurm-users] instant slurm

2019-03-22 Thread Christian Goll
Hello List, someone came up with the question if its possible to use srun in the context of a fat and more granular numaclt. As srun only works in combination with slurmcltd and slurmd, I have created a small script, which creates a minimal standalone slurm.conf and gres.conf, which is attached to

Re: [slurm-users] Very large job getting starved out

2019-03-22 Thread David Baker
Hello, Running the command "squeue -j 359323 --start" gives me the following output... JOBID PARTITION NAME USER ST START_TIME NODES SCHEDNODES NODELIST(REASON) 359323 batchbatch jwk PD N/A 27 (null) (Resources) Bes

Re: [slurm-users] Database Tuning w/SLURM

2019-03-22 Thread Ole Holm Nielsen
On 3/21/19 6:56 PM, Ryan Novosielski wrote: On Mar 21, 2019, at 12:21 PM, Loris Bennett wrote: Our last cluster only hit around 2.5 million jobs after around 6 years, so database conversion was never an issue. For sites with a higher-throughput things may be different, but I would hope that