On Tuesday, 5 May 2020 11:00:27 PM PDT Maria Semple wrote:
> Is there no way to achieve what I want then? I'd like the first and last job
> steps to always be able to run, even if the second step needs too many
> resources (based on the cluster).
That should just work.
#!/bin/bash
#SBATCH -c 2
#
Hi Chris,
Thanks for the tip about the memory units, I'll double check that I'm using
them.
Is there no way to achieve what I want then? I'd like the first and last
job steps to always be able to run, even if the second step needs too many
resources (based on the cluster).
As a side note, do you
On Tuesday, 5 May 2020 3:21:45 PM PDT Dustin Lang wrote:
> Since this happens on a fresh new database, I just don't understand how I
> can get back to a basic functional state. This is exceedingly frustrating.
I have to say that if you're seeing this with 17.11, 18.08 and 19.05 and this
only st
On Tuesday, 5 May 2020 4:47:12 PM PDT Maria Semple wrote:
> I'd like to set different resource limits for different steps of my job. A
> sample script might look like this (e.g. job.sh):
>
> #!/bin/bash
> srun --cpus-per-task=1 --mem=1 echo "Starting..."
> srun --cpus-per-task=4 --mem=250 --exclu
On Tuesday, 5 May 2020 3:48:22 PM PDT Sean Crosby wrote:
> sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4
Also don't forget you need to tell Slurm to enforce QOS limits with:
AccountingStorageEnforce=safe,qos
in your Slurm configuration ("safe" is good to set, and turns on enforcemen
Hi!
I'd like to set different resource limits for different steps of my job. A
sample script might look like this (e.g. job.sh):
#!/bin/bash
srun --cpus-per-task=1 --mem=1 echo "Starting..."
srun --cpus-per-task=4 --mem=250 --exclusive
srun --cpus-per-task=1 --mem=1 echo "Finished."
Then I woul
Hi Michael,
I get the gist of everything you mentioned but now I feel even more
overwhelmed. Can I not get jupyterhub up and running without all those modules
pieces? I was hoping to have a base kernel in jupyterhub that contained a lot
of the data science packages. I'm going to have some devel
Hi Thomas,
That value should be
sacctmgr modify qos gpujobs set MaxTRESPerUser=gres/gpu=4
Sean
--
Sean Crosby | Senior DevOpsHPC Engineer and HPC Team Lead
Research Computing Services | Business Services
The University of Melbourne, Victoria 3010 Australia
On Wed, 6 May 2020 at 04:53, Theis,
Hi,
I've just upgraded to slurm 19.05.5.
With either my old database, OR creating an entirely new database, I am
unable to create a new 'cluster' entry in the database -- slurmdbd is
segfaulting!
# sacctmgr add cluster test3
Adding Cluster(s)
Name = test3
Would you like to commit ch
I tried upgrading Slurm to 18.08.9 and I am still getting this Segmentation
Fault!
On Tue, May 5, 2020 at 2:39 PM Dustin Lang wrote:
> Hi,
>
> Apparently my colleague upgraded the mysql client and server, but, as far
> as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql
> rel
Hey Killian,
I tried to limit the number of gpus a user can run on at a time by adding
MaxTRESPerUser = gres:gpu4 to both the user and the qos.. I restarted slurm
control daemon and unfortunately I am still able to run on all the gpus in the
partition. Any other ideas?
Thomas Theis
From: slur
Hi,
Apparently my colleague upgraded the mysql client and server, but, as far
as I can tell, this was only 5.7.29 to 5.7.30, and checking the mysql
release notes I don't see anything that looks suspicious there...
cheers,
--dustin
On Tue, May 5, 2020 at 1:37 PM Dustin Lang wrote:
> Hi,
>
> W
Hi,
We're running Slurm 17.11.12. Everything has been working fine, and then
suddenly slurmctld is crashing and slurmdbd is crashing.
We use fair-share as part of the queuing policy, and previously set up
accounts with sacctmgr; that has been working fine for months.
If I run slurmdbd in debug
Aside from any Slurm configuration, I’d recommend setting up a modules [1 or 2]
folder structure for CUDA and other third-party software. That handles
LD_LIBRARY_PATH and other similar variables, reduces the chances for library
conflicts, and lets users decide their environment on a per-job basi
Thanks Guy, I did find that there was a jupyterhub_slurmspawner log in my home
directory. That enabled me to find out that it could not find the path for
batchspawner-singleuser.
So I added this to jupyter_config.py
export PATH=/opt/rh/rh-python36/root/bin:$PATH
That seemed to now allow the
Haven’t done it yet myself, but it’s on my todo list.
But I’d assume that if you use the FlexLM or RLM parts of that documentation,
that Slurm would query the remote license server periodically and hold the job
until the necessary licenses were available.
> On May 5, 2020, at 8:37 AM, navin sri
Thanks Michael,
yes i have gone through but the licenses are remote license and it will be
used by outside as well not only in slurm.
so basically i am interested to know how we can update the database
dynamically to get the exact value at that point of time.
i mean query the license server and up
Have you seen https://slurm.schedmd.com/licenses.html already? If the software
is just for use inside the cluster, one Licenses= line in slurm.conf plus users
submitting with the -L flag should suffice. Should be able to set that license
value is 4 if it’s licensed per node and you can run up to
Hi Team,
we have an application whose licenses is limited .it scales upto 4
nodes(~80 cores).
so if 4 nodes are full, in 5th node job used to get fail.
we want to put a restriction so that the application can't go for the
execution beyond the 4 nodes and fail it should be in queue state.
i do not
Hi,
please post also the stdout/stderr of the job7117..
What I don't see in UR config and I do have there is:
c.SlurmSpawner.hub_connect_ip = '192.168.1.1' #- the IP where slurm job
will try to connect to jupyterhub.
Also check if port 8081 is reachable from compute nodes.
--
josef
On 05
Hi Lisa,
Below is my jupyterhub slurm config. It uses the profiles, which allows you
to spawn different sized jobs. I found the most useful thing for debugging
is to make sure that the --output option is being honoured; any jupyter
python errors will end up there, and to to explicitly set the pyt
21 matches
Mail list logo