Hi, Thanks, DNS is broken on my warewulf nodes it seems for some reason, but it cant find the controller's IP for some reason.
I had already swapped to, NodeName=node[1-7] and that works OK but not tested much yet. TY for the MB headsup, I had assumed GB. So I assume 48 x 1024? This is a PoC (proof of concept) so some VM nodes with some old nodes thrown in just to test if I can build a HPC. regards Steven ________________________________ From: Christian Goll via slurm-users <slurm-users@lists.schedmd.com> Sent: Thursday, 5 December 2024 3:35 am To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com> Subject: [slurm-users] Re: Slurm not running on a warewulf node On 04.12.24 11:30, Hermann Schwärzler via slurm-users <slurm-users@lists.schedmd.com> wrote: > Hi Steven, > > yes, you have the syntax a bit wrong. If you consult the documentation > (or the man-page) of slurm.conf you find this in the "NODE > CONFIGURATION" section (in the paragraph about "NodeName"): > > Note that if the short form of the hostname is not used, it may prevent > use of hostlist expressions (the numeric portion in brackets must be at > the end of the string) > > So the respective part in your slurm.conf should be > > NodeName=node[1-7] ... > > and you have to configure your name resolution (default domain?) such > that these short names are resolvable to IP-addresses. > > If that's not feasible you might have to use e.g. something like this > > NodeName=DEFAULT CPUs=20 RealMemory=48 > NodeName=node1.ods.vuw.ac.nz > NodeName=node2.ods.vuw.ac.nz > ... > > > BTW: do your nodes only have 48 MB of memory? The unit in which > "RealMemory" has to be specified is megabytes. > > Regards, > Hermann > > > > > > On 12/4/24 01:47, Steven Jones via slurm-users wrote: > > I guess I have the syntax wrong, > > > > root@node1 slurm]# /usr/sbin/slurmd -D > > slurmd: fatal: Unable to create NodeAddr list from > > node[1-7].ods.vuw.ac.nz > > [root@node1 slurm]# tail /etc/slurm/slurm.conf > > #ResumeRate= > > #SuspendExcNodes= > > #SuspendExcParts= > > #SuspendRate= > > #SuspendTime= > > # > > # > > # COMPUTE NODES > > NodeName=node[1-7].ods.vuw.ac.nz CPUs=20 RealMemory=48 State=UNKNOWN > > PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP > > [root@node1 slurm]# > > > > > > regards > > > > Steven > > > > > > ------------------------------------------------------------------------ > > *From:* Steven Jones via slurm-users <slurm-users@lists.schedmd.com> > > *Sent:* Wednesday, 4 December 2024 1:28 pm > > *To:* slurm-us...@schedmd.com <slurm-us...@schedmd.com> > > *Subject:* [slurm-users] Re: Slurm not running on a warewulf node > > Well that is a start, TY. > > > > [root@node1 slurm]# /usr/sbin/slurmd -D > > slurmd: fatal: Unable to determine this slurmd's NodeName > > > > Where is this set? > > > > regards > > > > Steven > > > > > > ------------------------------------------------------------------------ > > *From:* Jeffrey R. Lang <jrl...@uwyo.edu> > > *Sent:* Wednesday, 4 December 2024 1:17 pm > > *To:* Steven Jones <steven.jo...@vuw.ac.nz>; slurm-us...@schedmd.com > > <slurm-us...@schedmd.com> > > *Subject:* RE: Slurm not running on a warewulf node > > > > You don't often get email from jrl...@uwyo.edu. Learn why this is > > important <https://aka.ms/LearnAboutSenderIdentification> > > > > > > Steve > > > > Trying running the failing process from the command line and use > > the -D option. > > > > Per man page: Run slurmd in the foreground. Error and debug messages > > will be copied to stderr. > > > > *Jeffrey R. Lang* > > > > Advanced Research Computing Center > > > > University of Wyoming, Information Technology Center > > > > 1000 E. University Ave > > > > Laramie, WY 82071 > > > > Email: jrl...@uwyo.edu > > > > Work: 307.766.3381 > > > > *From:* Steven Jones via slurm-users <slurm-users@lists.schedmd.com> > > *Sent:* Tuesday, December 3, 2024 5:39 PM > > *To:* slurm-us...@schedmd.com > > *Subject:* [slurm-users] slurm not running on a warewulf node > > > > ◆ This message was sent from a non-UWYO address. Please exercise > > caution when clicking links or opening attachments from external sources. > > > > Hi, > > > > I have set a log creation/location in slurm.conf as, > > > > SlurmdLogFile=/var/log/slurm/slurmd.log > > > > But it is 0 length. > > > > Slurm will not run, what else do I need to do to log why its failing pls? > > > > regards > > > > Steven > > > > > > > > Hello Steven, if its warewulf v4 you can have a look at https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fwarewulf%2Fwarewulf%2Fblob%2Fmain%2Fetc%2Fexamples%2Fslurm.conf.ww&data=05%7C02%7Csteven.jones%40vuw.ac.nz%7Cc5825b5aa4ce464bb94d08dd1474a3be%7Ccfe63e236951427e8683bb84dcf1d20c%7C0%7C0%7C638689213504957486%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=SYQjJKU7LC9IMd0Od0NgAA4iiQbnViS9uYZv7JHq4RQ%3D&reserved=0<https://github.com/warewulf/warewulf/blob/main/etc/examples/slurm.conf.ww> which is a template for a slurm.conf created from the available nodes in warewulf. kind regards, Christian -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com