We recently ran into an issue where a user was submitting job arrays of
0-500 one after another and would receive a message that the QOS limit had
been reached when "squeue --qos hepx --noheader |wc -l" would only print
~3000. The GrpSubmitJobs value for that QOS is 5000.
I looked at the code of
We aren't using QOS values right now, just basic node limits on accounts,
but I'm guessing they shouldn't be hard to adapt. I'd certainly appreciate
it if you can send them along.
Thanks!
Jared
On Fri, Mar 27, 2015 at 6:00 AM, Bill Wichser wrote:
>
> Jared,
>
>
> I have a few script to show Q
Shawn,
We observed the same behavior with our upgrade. I believe --export was
not an option for srun prior to 14.11.0-pre4. From the NEWS file
included with SLURM:
=
-- Added srun --export option to set/export specific environment
variables.
=
According to the man page, t
Thanks very much for these suggestions - I've set a value for max_rpc_cnt and
we should see soon if
this helps.
Cheers
Stuart
On 27/03/15 14:09, Paul Edmon wrote:
>
> So we have had the same problem, usually due to the scheduler receiving tons
> of requests. Usually
> this is fixed by havi
So we have had the same problem, usually due to the scheduler receiving
tons of requests. Usually this is fixed by having the scheduler slow
itself down by using defer, or the max_rpc_cnt options. We in particular
use max_rpc_cnt=16. I actually did a test yesterday where I removed
this and
Greetings,
Our cluster has both non-slurm controlled interactive jobs and slurm controlled
jobs being run on it. In general we would like to prioritize the non-slurm
controlled interactive jobs by having slurm jobs niced to a level higher than
the default. Is this possible.
Regards,
Brian
s
Jared,
I have a few script to show QOS values and what is running under each.
Users can use this to see how many resources are left. Somthing like this:
# qos
Name Priority GrpNodes GrpCPUs MaxCPUsPU MaxJobsPU MaxNodesPU
MaxSubmit
-- -- -
Hi,
Using gdb you can retrieve which thread own the locks on the slurmctld
internal structures (and block all the others).
Then it will be easier to understand what is happening.
Le 27/03/2015 12:24, Stuart Rankin a écrit :
> Hi,
>
> I am running slurm 14.11.4 on a 800 node RHEL6.6 general-purpo
The VPN will only guarantee that no-one can sniff the traffic between nodes.
It will not help you if one node is compromised: the attacker can use
the VPN to communicate with the rest of the cluster.
Le 27/03/2015 12:22, Simon Michnowicz a écrit :
> Re: [slurm-dev] Re: Slurm and MUNGE security
>
Hi,
I am running slurm 14.11.4 on a 800 node RHEL6.6 general-purpose University
cluster. Since upgrading
from 14.03.3 we have been seeing the following problem and I'd appreciate any
advice (maybe it's a
bug but maybe I'm missing something obvious).
Occasionally the number of slurmctld threads
Mehdi
thanks for the response. Even though I am not sure how a MUNGE key could be
compromised, (if you became root on a box you could equally take the ssh
keys), prudence would dictate that SLURM traffic go via a VPN, so that one
bad node does not effect others?
regards
Simon
On 27 March 2015 at 2
Hi Simon,
As far as I know, munge allows the communication to be authenticated but
they are not encrypted.
If the key is compromised, you may can send RPC to slurm daemons
pretending you are the slurm controller (and the user requesting the job
is root).
So yes in theory you should be able to exec
Hi all,
Has anybody seen a similar issue? I want to keep the norequeue as default, but
allow users to override them from job script or command line during submission.
After setting JobRequeue=0 in slurm.conf, jobs are not getting requeued -
neither using:
sbatch --requeue jobscript.slurm
or us
13 matches
Mail list logo