[slurm-users] Re: Print Slurm Stats on Login

2024-08-27 Thread Simon Andrews via slurm-users
if you’re doing that for thousands of jobs. If anyone knows how to get this to show up correctly in squeue/sacct that would be super helpful. Simon. From: Davide DelVento Sent: 21 August 2024 00:14 To: Kevin Broch ; Simon Andrews Cc: slurm-users@lists.schedmd.com Subject: Re: [slurm-users

[slurm-users] Re: Print Slurm Stats on Login

2024-08-20 Thread Simon Andrews via slurm-users
Possibly a bit more elaborate than you want but I wrote a web based monitoring system for our cluster. It mostly uses standard slurm commands for job monitoring, but I've also added storage monitoring which requires a separate cron job to run every night. It was written for our cluster, but pr

[slurm-users] Jobs being denied for GrpCpuLimit despite having enough resource

2024-03-14 Thread Simon Andrews via slurm-users
Our cluster has developed a strange intermittent behaviour where jobs are being put into a pending state because they aren't passing the AssocGrpCpuLimit, even though the user submitting has enough cpus for the job to run. For example: $ squeue -o "%.6i %.9P %.8j %.8u %.2t %.10M %.7m %.7c %.20R

Re: [slurm-users] R jobs crashing when run in parallel

2021-03-30 Thread Simon Andrews
le handles. William On Mon, 29 Mar 2021, 17:36 Patrick Goetz, mailto:pgo...@math.utexas.edu>> wrote: Could this be a function of the R script you're trying to run, or are you saying you get this error running the same script which works at other times? On 3/29/21 7:47 AM, Simon Andrew

[slurm-users] R jobs crashing when run in parallel

2021-03-29 Thread Simon Andrews
I've got a weird problem on our slurm cluster. If I submit lots of R jobs to the queue then as soon as I've got more than about 7 of them running at the same time I start to get failures, saying: /bi/apps/R/4.0.4/lib64/R/bin/exec/R: error while loading shared libraries: libpcre2-8.so.0: cannot

[slurm-users] QOS cutting off users before CPU limit is reached

2020-04-27 Thread Simon Andrews
I'm trying to use QoS limits to dynamically change the number of CPUs a user is allowed to use on our cluster. As far as I can see I'm setting the appropriate GrpTRES=cpu value and I can read that back, but then jobs are being stopped before the user has reached that limit. In squeue I see loa

Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-27 Thread Simon Andrews
t;/etc/ssh/ssh_host_rsa_key"; static char *hostkey_pub = "/etc/ssh/ssh_host_rsa_key.pub"; static char *priv_format = "%s/.ssh/id_rsa"; static char *pub_format = "%s/.ssh/id_rsa.pub"; On Jan 27, 2020, at 09:34 , Simon Andrews mailto:simon.andr...@babraham.a

Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-27 Thread Simon Andrews
den configuration files for in their home directory? William On Fri, 24 Jan 2020 at 16:05, Simon Andrews mailto:simon.andr...@babraham.ac.uk>> wrote: I have a weird problem which I can’t get to the bottom of. We have a cluster which allows users to start interactive sessions which f

Re: [slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-27 Thread Simon Andrews
r in their home directory? William On Fri, 24 Jan 2020 at 16:05, Simon Andrews mailto:simon.andr...@babraham.ac.uk>> wrote: I have a weird problem which I can’t get to the bottom of. We have a cluster which allows users to start interactive sessions which forward any X11 sessions they gen

[slurm-users] Srun not setting DISPLAY with --x11 for one account

2020-01-24 Thread Simon Andrews
I have a weird problem which I can't get to the bottom of. We have a cluster which allows users to start interactive sessions which forward any X11 sessions they generated on the head node. This generally works fine, but on the account of one user it doesn't work. The X11 connection to the he