Just a correction We use sacctmgr modify user=<username> set qos+=gpu-rtx6000-2
Amjad On Tue, Aug 31, 2021 at 10:17 AM Amjad Syed <amjad...@gmail.com> wrote: > Hi Sean > > We have been adding by using the following command > > sacctmgr modify user set qos+=gpu-rtx-reserved > > We have a single account that is associated with all our users and root > account for admin > > > > Is that the issue, we need to associate user with account? > > > On Tue, Aug 31, 2021 at 9:38 AM Sean Crosby <scro...@unimelb.edu.au> > wrote: > >> Hi Amjad, >> >> AccountingStorageUser is the user used to connect to the accounting >> database. If you have it defined in slurm.conf, it is ignored. >> >> From the output you showed, it says the user cjr13geu in the cluster >> uea_cluster has access to the QoS. >> >> How are you adding the QoS to other users? The way you would do it would >> be >> >> sacctmgr modify account <accountname> user=<username> set qos+= >> gpu-rtx-reserved >> >> or >> >> sacctmgr modify account <accountname> set qos+=gpu-rtx-reserved >> >> if you want to give it to every user in <accountname> >> >> Sean >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Amjad Syed <amjad...@gmail.com> >> *Sent:* Tuesday, 31 August 2021 17:46 >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* Re: [slurm-users] [EXT] User association with partition and >> Qos >> >> * External email: Please exercise caution * >> ------------------------------ >> Hi Sean >> >> Here is the output for gpu-rtx-reserved qos >> >> sacctmgr show account withassoc -p | grep gpu-rtx-reserved >> >> >> >> default|default|default|uea_cluster||cjr13geu|1|||||||||||||||gpu,gpu-k40-1,gpu-rtx, >> *gpu-rtx-reserved*,hmem,ht,uea_def_qos| >> >> >> >> >> >> sontrol show part gpu-rtx6000-2 >> >> PartitionName=gpu-rtx6000-2 >> >> AllowGroups=ALL AllowAccounts=ALL >> AllowQos=gpu-rtx,gpu-rtx-reserved,jakeuea >> >> AllocNodes=ALL Default=NO QoS=N/A >> >> DefaultTime=1-00:00:00 DisableRootJobs=NO ExclusiveUser=NO >> GraceTime=0 Hidden=NO >> >> MaxNodes=9 MaxTime=7-00:00:00 MinNodes=0 LLN=NO >> MaxCPUsPerNode=UNLIMITED >> >> Nodes=g[15-29] >> >> PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO >> OverSubscribe=NO >> >> OverTimeLimit=NONE PreemptMode=GANG,SUSPEND >> >> State=UP TotalCPUs=720 TotalNodes=15 SelectTypeParameters=NONE >> >> JobDefaults=(null) >> >> DefMemPerCPU=3996 MaxMemPerNode=UNLIMITED >> >> >> >> >> On a different note we have the following in slurm.conf >> >> >> AccountingStorageUser=slurm >> >> >> But we have been adding qos and assigning users as root ? Can this be an >> issue >> >> >> >> >> Amjad >> >> On Tue, Aug 31, 2021 at 8:22 AM Sean Crosby <scro...@unimelb.edu.au> >> wrote: >> >> What does sacctmgr show for the user you added to have access to the QoS, >> and what does Slurm show for the partition config? >> >> sacctmgr show account withassoc -p >> scontrol show part gpu-rtx6000-2 >> >> Sean >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Amjad Syed <amjad...@gmail.com> >> *Sent:* Tuesday, 31 August 2021 17:03 >> *To:* Slurm User Community List <slurm-users@lists.schedmd.com> >> *Subject:* Re: [slurm-users] [EXT] User association with partition and >> Qos >> >> * External email: Please exercise caution * >> ------------------------------ >> Hello me again >> >> Just found out that when our slurmctld restarts all qos are gone. >> >> I mean users who have association with the qos can not submit job with >> sbatch, they get error as >> >> sbatch: error: Batch job submission failed: Invalid qos specification >> >> >> Do we need to make anymore changes in slurm.conf so that qos becomes >> permanent ? >> >> Amjad >> >> On Fri, Aug 27, 2021 at 3:32 PM Amjad Syed <amjad...@gmail.com> wrote: >> >> Hi Sean, >> >> Thanks for the suggestion, seems to work now. >> >> Majid >> >> On Fri, Aug 27, 2021 at 12:56 PM Sean Crosby <scro...@unimelb.edu.au> >> wrote: >> >> Hi Amjad, >> >> Make sure you have qos in the config entry AccountingStorageEnforce >> >> e.g. >> >> AccountingStorageEnforce=associations,limits,qos,safe >> >> Sean >> >> ------------------------------ >> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf of >> Amjad Syed <amjad...@gmail.com> >> *Sent:* Friday, 27 August 2021 20:28 >> *To:* slurm-us...@schedmd.com <slurm-us...@schedmd.com> >> *Subject:* [EXT] [slurm-users] User association with partition and Qos >> >> * External email: Please exercise caution * >> ------------------------------ >> Hello all >> >> We are having an issue understanding user association and partition. >> >> Currently we have a partition with 30 GPU cards . >> >> We have defined a qos gpu-rtx that allows user to reserve 2 cards >> >> sacctmgr show qos gpu-rtx format=MaxTRESPU%60 >> >> MaxTRESPU >> >> ----------------------------------------------------- >> cpu=96,gres/gpu=2 >> >> >> >> >> We have defined a user test that is assoc with this qos >> >> >> sacctmgr show assoc user=test format=user,qos >> >> >> Qos >> >> gpu-rtx >> >> >> >> Now we define another qos gpu-rtx-reserved that allows gpu=8 >> >> >> sacctmgr show qos gpu-rtx-reserved format=MaxTRESPU%60 >> >> MaxTRESPU >> >> ----------------------------------------------------- >> cpu=192,gres/gpu=8 >> >> User test is not associated with gpu-rtx-reserved qos. So he should not >> be able to use more then gpu=2 . >> Both of these qos are now in slurm.conf for the partition >> >> parrtitionName=gpu-rtx6000-2 State=UP Nodes=g[15-29] MaxNodes=9 >> MaxTime=168:00:00 DefMemPerCPU=3996 AllowQos=gpu-rtx,gpu-rtx-reserved >> >> >> >> But we found out that even though user is not assoc with gpu-rtx-reserved >> if the user uses gpu-rtx-reserved in his slurm script , he can reserve 8 >> gpu cards >> >> >> So our question is , can the users assoc with one partition qos can use >> the other qos in the partition even if they are not associated with it . >> or in other words , we can only define one partition qos and not more then >> one.? >> >> >> Hope i was able to explain ? >> >> >> Any advice if we want partition to use more then one qos with different >> limits and users associated with one qos should not use other qos ? >> >> >> Majid >> >> >> >> >>