[slurm-users] Re: Slurm not running on a warewulf node

2024-12-03 Thread Jeffrey R. Lang via slurm-users
Steve Trying running the failing process from the command line and use the -D option. Per man page: Run slurmd in the foreground. Error and debug messages will be copied to stderr. Jeffrey R. Lang Advanced Research Computing Center University of Wyoming, Information Technology Center 1000 E.

[slurm-users] Re: [EXT] RE: [EXT] RE: [EXT] RE: [EXT] RE: Nodes required for job are down, drained or reserved

2024-04-09 Thread Jeffrey R. Lang via slurm-users
Alison I’m glad I was able to help. Good luck. Jeff From: Alison Peterson Sent: Tuesday, April 9, 2024 4:09 PM To: Jeffrey R. Lang Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved Than

[slurm-users] Re: [EXT] RE: [EXT] RE: [EXT] RE: Nodes required for job are down, drained or reserved

2024-04-09 Thread Jeffrey R. Lang via slurm-users
Alison In your case since you are using head as both a slurm management node and a compute node you’ll need to setup slurmd on the head node. Once the slurmd is running use “sinfo” to see what the status of the node is. Most likely down hopefully without an astrick. If that’s the case then

[slurm-users] Re: [EXT] RE: [EXT] RE: Nodes required for job are down, drained or reserved

2024-04-09 Thread Jeffrey R. Lang via slurm-users
Alison The sinfo shows that your head node is down due to come configuration error. Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it. Can you redo the scontrol command and “node name” should be “nodename” one word.

[slurm-users] Re: [EXT] RE: Nodes required for job are down, drained or reserved

2024-04-09 Thread Jeffrey R. Lang via slurm-users
Alison Can you provide the output of the following commands: * sinfo * scontrol show node name=head and the job command that your trying to run? From: Alison Peterson Sent: Tuesday, April 9, 2024 3:03 PM To: Jeffrey R. Lang Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [

[slurm-users] Re: Nodes required for job are down, drained or reserved

2024-04-09 Thread Jeffrey R. Lang via slurm-users
Alison The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error. I would suggest that you create at least one compute node. Once, you do that this error should go away. Jeff From: Alison Peterson via slurm-