On Fri, 2021-12-17 at 13:03:32 +0530, Sudeep Narayan Banerjee wrote:
> Hello All: Can we please restrict one GPU job on one GPU node?
>
> That is,
> a) when we submit a GPU job on an empty node (say gpu2) requesting 16 cores
> as that gives the best performance in the GPU and it gives best perform
Hello All: Can we please restrict one GPU job on one GPU node?
That is,
a) when we submit a GPU job on an empty node (say gpu2) requesting 16 cores
as that gives the best performance in the GPU and it gives best performance.
b) Then another user flooded the CPU cores on gpu2 sharing the GPU
resour
...and you shouldn't be able to do this with a QoS (I think as you want it
to), as "grptresrunmins" applies to the aggregate of everything using the
QoS.
On Thu, Dec 16, 2021 at 6:12 PM Fulcomer, Samuel
wrote:
> I've not parsed your message very far, but...
>
> for i in `cat limit_users` ; do
>
I've not parsed your message very far, but...
for i in `cat limit_users` ; do
sacctmgr where user=$i partition=foo account=bar set
grptresrunmins=cpu=Nlimit
On Thu, Dec 16, 2021 at 6:01 PM Ross Dickson
wrote:
> It would like to impose a time limit stricter than the partition limit on
> a certa
It would like to impose a time limit stricter than the partition limit on a
certain subset of users. I should be able to do this with a QOS, but I
can't get it to work. What am I missing?
At https://slurm.schedmd.com/resource_limits.html it says,
"Slurm's hierarchical limits are enforced in the
There's no clear answer to this. It depends a bit on how you've segregated
your resources.
In our environment, GPU and bigmem nodes are in their own partitions.
There's nothing to prevent a user from specifying a list of potential
partitions in the job submission, so there would be no need for the
Indeed. We use this and BELIEVE that it works, lol!
Bill
function slurm_job_modify ( job_desc, job_rec, part_list, modify_uid )
if modify_uid == 0 then
return 0
end
if job_desc.qos ~= nil then
return 1
end
return 0
end
On
As far a I remember you can use the job_submit lua plugin to prevent any
change on the jobs
On Thu, 16 Dec 2021 at 21:47, Bernstein, Noam CIV USN NRL (6393) Washington
DC (USA) wrote:
> Is there a meaningful difference between using "scontrol update" and just
> killing the job and resubmitting w
Is there a meaningful difference between using "scontrol update" and just
killing the job and resubmitting with those resources already requested?
Hi everyone,
I was wondering if there is a way to prevent users from updating their jobs
with "scontrol update job".
Here is the justification.
A hypothetical user submits a job requesting a regular node, but
he/she realises that the large memory nodes or the GPU nodes are idle.
Using the previo
> One of the open problems is a way to provide the password for
mounting the encrypted directory inside a slurm-job. But this should be
solvable.
I'd be really interested to hear more about the mechanism to distribute
credentials across compute nodes in secure way, especially if we're
using f
@list,
is here any experience with recent versions of Slurm and kerberized NFS
at compute nodes?
I saw older (~201x) tutorial and slidedecks describing auks, but after
checking the its github project I feel like it is non-mainstream solution.
Is my understanding correct that using kerberize
12 matches
Mail list logo