Hello,

*Issue 1:*
I am using slurm version 24.05.1 , my slurmd has a single node where I
connect multiple gres by enabling the overscribe feature.
I am able to use the advance reservation of gres only using *gres** name*
(tres=gres/gpu:*SYSTEM12*).


i.e while in reservation period , if other users submits job with SYSTEM12
, then slurm places this job in queue

*user1@host$ srun --gres=gpu:SYSTEM12:1 hostname*
*srun: job 333 queued and waiting for resources *

but when other users just submit a job without any system  name , slurm
jobs goes through on that gres immediately even though it is reserved.

*user1@host$ srun --gres=gpu:1 hostname
                    *
*mylinux.wbi.com <http://mylinux.wbi.com>             *


Also I can see GresUsed in busy mode using "*scontrol show node -d*"   ,
this means the job is running on Gres/GPU and not on cpu etc.


Same way , job submission based on Feature "rev1 in my case" is also going
through even though it is reserved for other users in multiple partition
slurm.

*snippet of slurm.conf file*
NodeName=cluster01 NodeAddr=cluster Port=6002CPUs=8 Boards=1
SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 Feature="rev1"
Gres=gpu:SYSTEM12:1 RealMemory=64171 State=IDLE

*Issue 2:*

while execution , Slurm o/p's some extra prints in the srun output

user1@host$ srun --gres=gpu:1 hostname


srun: error: extract_net_cred: net_cred not provided

srun: error: Malformed RPC of type RESPONSE_NODE_ALIAS_ADDRS(3017)
received
                                              srun: error:
slurm_unpack_received_msg: [[inv1715771615.nxdi.us-aus01.nxp.com]:41242]
Header lengths are longer than data received
*mylinux.wbi.com <http://mylinux.wbi.com>*

Regards,
MS
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to