Dear Nousheen,
I guess there is something missing in your installation - proably your
slurm.conf?
Do you have logging enabled for slurmctld? If yes what do you see in
that log?
Or what do you get if you run slurmctld manually like this:
/usr/local/sbin/slurmctld -D
Regards,
Hermann
On 1/31/22 6:08 AM, Nousheen wrote:
Dear Jeffrey,
Thank you for your response. I have followed the steps as instructed.
After the copying the files to their respective locations "systemctl
status slurmctld.service" command gives me an error as follows:
(base) [nousheen@exxact system]$ systemctl daemon-reload
(base) [nousheen@exxact system]$ systemctl enable slurmctld.service
(base) [nousheen@exxact system]$ systemctl start slurmctld.service
(base) [nousheen@exxact system]$ systemctl status slurmctld.service
● slurmctld.service - Slurm controller daemon
Loaded: loaded (/etc/systemd/system/slurmctld.service; enabled;
vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2022-01-31 10:04:31
PKT; 3s ago
Process: 18114 ExecStart=/usr/local/sbin/slurmctld -D -s
$SLURMCTLD_OPTIONS (code=exited, status=1/FAILURE)
Main PID: 18114 (code=exited, status=1/FAILURE)
Jan 31 10:04:31 exxact systemd[1]: Started Slurm controller daemon.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service: main process
exited, code=exited, status=1/FAILURE
Jan 31 10:04:31 exxact systemd[1]: Unit slurmctld.service entered failed
state.
Jan 31 10:04:31 exxact systemd[1]: slurmctld.service failed.
Kindly guide me. Thank you so much for your time.
Best Regards,
Nousheen Parvaiz
ᐧ
On Thu, Jan 27, 2022 at 8:25 PM Jeffrey R. Lang <jrl...@uwyo.edu
<mailto:jrl...@uwyo.edu>> wrote:
The missing file error has nothing to do with slurm. The systemctl
command is part of the systems service management.____
__ __
The error message indicates that you haven’t copied the
slurmd.service file on your compute node to /etc/systemd/system or
/usr/lib/systemd/system. /etc/systemd/system is usually used when a
user adds a new service to a machine.____
__ __
Depending on your version of Linux you may also need to do a
systemctl daemon-reload to activate the slurmd.service within
system.____
__ __
Once slurmd.service is copied over, the systemctld command should
work just fine.____
__ __
Remember:____
slurmd.service - Only on compute nodes____
slurmctld.service – Only on your cluster management
node____
slurmdbd.service – Only on your cluster management
node____
__ __
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com
<mailto:slurm-users-boun...@lists.schedmd.com>> *On Behalf Of *Nousheen
*Sent:* Thursday, January 27, 2022 3:54 AM
*To:* Slurm User Community List <slurm-users@lists.schedmd.com
<mailto:slurm-users@lists.schedmd.com>>
*Subject:* [slurm-users] systemctl enable slurmd.service Failed to
execute operation: No such file or directory____
__ __
◆ This message was sent from a non-UWYO address. Please exercise
caution when clicking links or opening attachments from external
sources.____
__ __
__ __
Hello everyone,____
__ __
I am installing slurm on Centos 7 following tutorial:
https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/
<https://www.slothparadise.com/how-to-install-slurm-on-centos-7-cluster/>____
__ __
I am at the step where we start slurm but it gives me the following
error:____
__ __
[root@exxact slurm-21.08.5]# systemctl enable slurmd.service____
Failed to execute operation: No such file or directory____
__ __
I have run the command to check if slurm is configured properly____
__ __
[root@exxact slurm-21.08.5]# slurmd -C
NodeName=exxact CPUs=12 Boards=1 SocketsPerBoard=1 CoresPerSocket=6
ThreadsPerCore=2 RealMemory=31889
UpTime=19-16:06:00____
__ __
I am new to this and unable to understand the problem. Kindly help
me resolve this.____
__ __
My slurm.conf file is as follows:____
__ __
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ClusterName=cluster194
SlurmctldHost=192.168.60.194
#SlurmctldHost=
#
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#EpilogSlurmctld=
#FirstJobId=1
#MaxJobId=67043328
#GresTypes=
#GroupUpdateForce=0
#GroupUpdateTime=600
#JobFileAppend=0
#JobRequeue=1
#JobSubmitPlugins=lua
#KillOnBadExit=0
#LaunchType=launch/slurm
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=10000
#MaxStepCount=40000
#MaxTasksPerNode=512
MpiDefault=none
#MpiParams=ports=#-#
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/cgroup
#Prolog=
#PrologFlags=
#PrologSlurmctld=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
#RebootProgram=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=nousheen
#SlurmdUser=root
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/home/nousheen/Documents/SILICS/slurm-21.08.5/slurmctld
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/affinity
#TaskProlog=
#TopologyPlugin=topology/tree
#TmpFS=/tmp
#TrackWCKey=no
#TreeWidth=
#UnkillableStepProgram=
#UsePAM=0
#
#
# TIMERS
#BatchStartTimeout=10
#CompleteWait=0
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
KillWait=30
#MessageTimeout=10
#ResvOverRun=0
MinJobAge=300
#OverTimeLimit=0
SlurmctldTimeout=120
SlurmdTimeout=300
#UnkillableStepTimeout=60
#VSizeFactor=0
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
#MaxMemPerCPU=0
#SchedulerTimeSlice=30
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
#
#
# JOB PRIORITY
#PriorityFlags=
#PriorityType=priority/basic
#PriorityDecayHalfLife=
#PriorityCalcPeriod=
#PriorityFavorSmall=
#PriorityMaxAge=
#PriorityUsageResetPeriod=
#PriorityWeightAge=
#PriorityWeightFairshare=
#PriorityWeightJobSize=
#PriorityWeightPartition=
#PriorityWeightQOS=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
#AccountingStoreFlags=
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurmd.log
#SlurmSchedLogFile=
#SlurmSchedLogLevel=
#DebugFlags=
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#SuspendTimeout=
#ResumeTimeout=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=linux[1-32] CPUs=11 State=UNKNOWN____
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP ____
__ __
____
Best Regards,____
Nousheen Parvaiz____
ᐧ____