Hello Véronique,
slurmd uses statvfs or statfs (choice made at build time) to get TmpFS size
(with a default of /tmp if TmpFS is null).
I think Uwe is right, it could be a filesystem mount timing problem.
You can test it easily :
- starting from a correct state, stop slurmd
- unmount /local/scratch
- start slurmd
- check slurmd.log
Regarding slurmd -C , I checked the source (17.2.6) the size of /tmp is
returned (hardcoded), not sure if it should be considered as a bug.
Regards,
Pierre-Marie Le Biot
-----Original Message-----
From: Uwe Sauter [mailto:[email protected]]
Sent: Wednesday, October 11, 2017 4:18 PM
To: slurm-dev <[email protected]>
Subject: [slurm-dev] RE: Node always going to DRAIN state with reason=Low
TmpDisk
What distribution are you using? If it is using systemd then it is possible
that slurmd gets started before /local/scratch is mounted. You'd need to add a
dependency to the slurmd service so it waits till /local/scratch is mounted
before the service is started.
Am 11.10.2017 um 15:38 schrieb Véronique LEGRAND:
> Hello Pierre-Marie,
>
>
>
> I stopped the slurmd daemon on tars-XXX then restarted it in the foreground
> with:
>
>
>
> sudo /my/path/to /slurmd -vvvvvvvv -D -d /opt/slurm/sbin/slurmstepd
>
>
>
> and got:
>
> slurmd: Gres Name=disk Type=(null) Count=204000
>
> and also:
>
> slurmd: debug3: CPUs=12 Boards=1 Sockets=2 Cores=6 Threads=1
> Memory=258373 TmpDisk=204699 Uptime=162294 CPUSpecList=(null)
> FeaturesAvail=(null) FeaturesActive=(null)
>
> in the output.
>
> So, the value this time was correct.
>
>
>
> In slurmctld.log, I have:
>
> 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: Node
> *tars-113* now responding
>
> 2017-10-11T12:24:01+02:00 tars-master slurmctld[120352]: node
> *tars-113* returned to service
>
>
>
> I waited 2 hours and didn’t get any : error: Node tars-XXX has low
> tmp_disk size (129186 < 204000)
>
> So, I stopped it and started it again in the usual way:
>
>
>
> sudo /etc/init.d/slurm start at 2:38 pm
>
>
>
> I got no error message in slurmctld.log and no erroneous value in slurmd.log.
>
>
>
> A 2:49 pm, I reboot the machine and here is what I got in slurmd.log:
>
>
>
> -sh-4.1$ sudo cat slurmd.log
>
> 2017-10-11T14:50:30.742049+02:00 tars-113 slurmd[18621]: Message
> aggregation enabled: WindowMsgs=24, WindowTime=200
>
> 2017-10-11T14:50:30.797696+02:00 tars-113 slurmd[18621]: CPU frequency
> setting not configured for this node
>
> 2017-10-11T14:50:30.797706+02:00 tars-113 slurmd[18621]: Resource
> spec: Reserved system memory limit not configured for this node
>
> 2017-10-11T14:50:30.986903+02:00 tars-113 slurmd[18621]: cgroup
> namespace 'freezer' is now mounted
>
> 2017-10-11T14:50:31.023900+02:00 tars-113 slurmd[18621]: cgroup
> namespace 'cpuset' is now mounted
>
> 2017-10-11T14:50:31.066430+02:00 tars-113 slurmd[18633]: slurmd
> version 16.05.9 started
>
> 2017-10-11T14:50:31.123213+02:00 tars-113 slurmd[18633]: slurmd
> started on Wed, 11 Oct 2017 14:50:31 +0200
>
> 2017-10-11T14:50:31.123493+02:00 tars-113 slurmd[18633]: CPUs=12
> Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null)
> FeaturesActive=(null)
>
>
>
> Erroneous value was back again!
>
>
>
> So, I did again :
>
> sudo /etc/init.d/slurm stop
>
> sudo /etc/init.d/slurm start
>
>
>
> and the following lines were added to the log:
>
> 2017-10-11T14:51:29.707556+02:00 tars-113 slurmd[18633]: Slurmd
> shutdown completing
>
> 2017-10-11T14:51:51.496552+02:00 tars-113 slurmd[19047]: Message
> aggregation enabled: WindowMsgs=24, WindowTime=200
>
> 2017-10-11T14:51:51.555792+02:00 tars-113 slurmd[19047]: CPU frequency
> setting not configured for this node
>
> 2017-10-11T14:51:51.555803+02:00 tars-113 slurmd[19047]: Resource
> spec: Reserved system memory limit not configured for this node
>
> 2017-10-11T14:51:51.567003+02:00 tars-113 slurmd[19049]: slurmd
> version 16.05.9 started
>
> 2017-10-11T14:51:51.569174+02:00 tars-113 slurmd[19049]: slurmd
> started on Wed, 11 Oct 2017 14:51:51 +0200
>
> 2017-10-11T14:51:51.569533+02:00 tars-113 slurmd[19049]: CPUs=12
> Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=204699
> Uptime=155 CPUSpecList=(null) FeaturesAvail=(null)
> FeaturesActive=(null)
>
>
>
> The value for TmpDisk was correct again.
>
>
>
> So, my question is, where does slurmd read the value for the size of
> /local/scratch. Does it use “df” or another command? It seems that on
> startup, slurmd reads a value that is not yet set correctly…
>
>
>
> Thank you in advance for any help.
>
>
>
> Regards,
>
>
>
> Véronique
>
>
>
>
>
>
>
> --
>
> Véronique Legrand
>
> IT engineer – scientific calculation & software development
>
> https://research.pasteur.fr/en/member/veronique-legrand/
>
> Cluster and computing group
>
> IT department
>
> Institut Pasteur Paris
>
> Tel : 95 03
>
>
>
>
>
> *From: *"Le Biot, Pierre-Marie" <[email protected]>
> *Reply-To: *slurm-dev <[email protected]>
> *Date: *Tuesday, 10 October 2017 at 17:00
> *To: *slurm-dev <[email protected]>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Véronique,
>
>
>
> So that’s the culprit :
>
> 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12
> Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null)
> FeaturesActive=(null)
>
>
>
> For a reason you have to determine, when slurmd starts on tars-XXX it
> finds that the size of /local/scratch (assuming that this is the value
> of TmpFS in slurm.conf for this node) is 129186MB and sends this value to
> slurmctld which compares it with the value recorded in slurm.conf, that is
> 204000 for that node.
>
> By the way, 129186MB is very close to the size of /dev/shm…
>
>
>
> About the value returned by slurmd -C (500), it could be that /tmp is
> harcoded somewhere.
>
>
>
> Regards,
>
> Pierre-Marie Le Biot
>
>
>
> *From:*Véronique LEGRAND [mailto:[email protected]]
> *Sent:* Tuesday, October 10, 2017 4:33 PM
> *To:* slurm-dev <[email protected]>
> *Subject:* [slurm-dev] RE: Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Pierre-Marie,
>
>
>
> Here is what I have in slurmd.log on tars-XXX
>
>
>
> -sh-4.1$ sudo cat slurmd.log
>
> 2017-10-09T17:09:57.538636+02:00 tars-XXX slurmd[18597]: Message
> aggregation enabled: WindowMsgs=24, WindowTime=200
>
> 2017-10-09T17:09:57.647486+02:00 tars-XXX slurmd[18597]: CPU frequency
> setting not configured for this node
>
> 2017-10-09T17:09:57.647499+02:00 tars-XXX slurmd[18597]: Resource
> spec: Reserved system memory limit not configured for this node
>
> 2017-10-09T17:09:57.808352+02:00 tars-XXX slurmd[18597]: cgroup
> namespace 'freezer' is now mounted
>
> 2017-10-09T17:09:57.844400+02:00 tars-XXX slurmd[18597]: cgroup
> namespace 'cpuset' is now mounted
>
> 2017-10-09T17:09:57.902418+02:00 tars-XXX slurmd[18640]: slurmd
> version 16.05.9 started
>
> 2017-10-09T17:09:57.957030+02:00 tars-XXX slurmd[18640]: slurmd
> started on Mon, 09 Oct 2017 17:09:57 +0200
>
> 2017-10-09T17:09:57.957336+02:00 tars-XXX slurmd[18640]: CPUs=12
> Boards=1 Sockets=2 Cores=6 Threads=1 Memory=258373 TmpDisk=129186
> Uptime=74 CPUSpecList=(null) FeaturesAvail=(null)
> FeaturesActive=(null)
>
>
>
>
>
> --
>
> Véronique Legrand
>
> IT engineer – scientific calculation & software development
>
> https://research.pasteur.fr/en/member/veronique-legrand/
>
> Cluster and computing group
>
> IT department
>
> Institut Pasteur Paris
>
> Tel : 95 03
>
>
>
>
>
> *From: *"Le Biot, Pierre-Marie" <[email protected]
> <mailto:[email protected]>>
> *Reply-To: *slurm-dev <[email protected]
> <mailto:[email protected]>>
> *Date: *Tuesday, 10 October 2017 at 15:20
> *To: *slurm-dev <[email protected] <mailto:[email protected]>>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Véronique,
>
>
>
> This not what I expected, I was thinking slurmd -C would return
> TmpDisk=204000 or more probably 129186 as seen in slurmctld log.
>
>
>
> I suppose that you already checked slurmd logs on tars-XXX ?
>
>
>
> Regards,
>
> Pierre-Marie Le Biot
>
>
>
> *From:*Véronique LEGRAND [mailto:[email protected]]
> *Sent:* Tuesday, October 10, 2017 2:09 PM
> *To:* slurm-dev <[email protected] <mailto:[email protected]>>
> *Subject:* [slurm-dev] RE: Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Hello Pierre-Marie,
>
>
>
> First, thank you for your hint.
>
> I just tried.
>
>
>
>>slurmd -C
>
> NodeName=tars-XXX CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6
> ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
>
> UpTime=0-20:50:54
>
>
>
> The value for TmpDisk is erroneous. I do not know what can be the
> cause of this since the operating system df command gives the right values.
>
>
>
> -sh-4.1$ df -hl
>
> Filesystem Size Used Avail Use% Mounted on
>
> slash_root 3.5G 1.6G 1.9G 47% /
>
> tmpfs 127G 0 127G 0% /dev/shm
>
> tmpfs 500M 84K 500M 1% /tmp
>
> /dev/sda1 200G 33M 200G 1% /local/scratch
>
>
>
>
>
> Could slurmd be messing up tmpfs with /local/scratch?
>
>
>
> I tried the same thing on another similar node (tars-XXX-1)
>
>
>
> I got:
>
>
>
> -sh-4.1$ df -hl
>
> Filesystem Size Used Avail Use% Mounted on
>
> slash_root 3.5G 1.7G 1.8G 49% /
>
> tmpfs 127G 0 127G 0% /dev/shm
>
> tmpfs 500M 5.7M 495M 2% /tmp
>
> /dev/sda1 200G 33M 200G 1% /local/scratch
>
>
>
> and
>
>
>
> slurmd -C
>
> NodeName=tars-XXX-1 CPUs=12 Boards=1 SocketsPerBoard=2
> CoresPerSocket=6 ThreadsPerCore=1 RealMemory=258373 TmpDisk=500
>
> UpTime=101-21:34:14
>
>
>
>
>
> So, slurmd –C gives exactly the same answer but this node doesn’t go into
> DRAIN state; it works perfectly.
>
>
>
> Thank you again for your help.
>
>
>
> Regards,
>
>
>
> Véronique
>
>
>
>
>
>
>
> --
>
> Véronique Legrand
>
> IT engineer – scientific calculation & software development
>
> https://research.pasteur.fr/en/member/veronique-legrand/
>
> Cluster and computing group
>
> IT department
>
> Institut Pasteur Paris
>
> Tel : 95 03
>
>
>
>
>
> *From: *"Le Biot, Pierre-Marie" <[email protected]
> <mailto:[email protected]>>
> *Reply-To: *slurm-dev <[email protected]
> <mailto:[email protected]>>
> *Date: *Tuesday, 10 October 2017 at 13:53
> *To: *slurm-dev <[email protected] <mailto:[email protected]>>
> *Subject: *[slurm-dev] RE: Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Hi Véronique,
>
>
>
> Did you check the result of slurmd -C on tars-XXX ?
>
>
>
> Regards,
>
> Pierre-Marie Le Biot
>
>
>
> *From:*Véronique LEGRAND [mailto:[email protected]]
> *Sent:* Tuesday, October 10, 2017 12:02 PM
> *To:* slurm-dev <[email protected] <mailto:[email protected]>>
> *Subject:* [slurm-dev] Node always going to DRAIN state with
> reason=Low TmpDisk
>
>
>
> Hello,
>
>
>
> I have a problem with 1 node in our cluster. It is exactly as all the
> other nodes (200 GB of temporary storage)
>
>
>
> Here is what I have in slurm.conf:
>
>
>
> # COMPUTES
>
> TmpFS=/local/scratch
>
>
>
> # NODES
>
> GresTypes=disk,gpu
>
> ReturnToService=2
>
> NodeName=DEFAULT State=UNKNOWN Gres=disk:204000,gpu:0 TmpDisk=204000
>
> NodeName=tars-[XXX-YYY] Sockets=2 CoresPerSocket=6 RealMemory=254373
> Feature=ram256,cpu,fast,normal,long,specific,admin Weight=20
>
>
>
> The node that has the trouble is tars-XXX.
>
>
>
> Here is what I have in gres.conf:
>
>
>
> # Local disk space in MB (/local/scratch)
>
> NodeName=tars-[ZZZ-UUU] Name=disk Count=204000
>
>
>
> XXX is in range: [ZZZ,UUU].
>
>
>
> If I ssh to tars-XXX, here is what I get:
>
>
>
> -sh-4.1$ df -hl
>
> Filesystem Size Used Avail Use% Mounted on
>
> slash_root 3.5G 1.6G 1.9G 47% /
>
> tmpfs 127G 0 127G 0% /dev/shm
>
> tmpfs 500M 84K 500M 1% /tmp
>
> /dev/sda1 200G 33M 200G 1% /local/scratch
>
>
>
> /local/scratch is the directory for temporary storage.
>
>
>
> The problem is when I do
>
> scontrol show node tars-XXX,
>
>
>
> I get:
>
>
>
> NodeName=tars-XXX Arch=x86_64 CoresPerSocket=6
>
> CPUAlloc=0 CPUErr=0 CPUTot=12 CPULoad=0.00
>
> AvailableFeatures=ram256,cpu,fast,normal,long,specific,admin
>
> ActiveFeatures=ram256,cpu,fast,normal,long,specific,admin
>
> Gres=disk:204000,gpu:0
>
> NodeAddr=tars-113 NodeHostName=tars-113 Version=16.05
>
> OS=Linux RealMemory=254373 AllocMem=0 FreeMem=255087 Sockets=2
> Boards=1
>
> State=IDLE+DRAIN ThreadsPerCore=1 TmpDisk=204000 Weight=20
> Owner=N/A MCS_label=N/A
>
> BootTime=2017-10-09T17:08:43 SlurmdStartTime=2017-10-09T17:09:57
>
> CapWatts=n/a
>
> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
>
> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
>
> Reason=Low TmpDisk [slurm@2017-10-10T11:25:04]
>
>
>
>
>
> And in the slurmctld logs, I get the error message:
>
> 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error: Node
> tars-XXX has low tmp_disk size (129186 < 204000)
>
> 2017-10-10T08:35:57+02:00 tars-master slurmctld[120352]: error:
> _slurm_rpc_node_registration node=tars-XXX: Invalid argument
>
>
>
> I tried to reboot tars-XXX yesterday but the problem is still here.
>
> I also tried:
>
> scontrol update NodeName=ClusterNode0 State=Resume
>
> but state went back to DRAIN after a while…
>
>
>
> Does anyone have an idea of what could cause the problem? My
> configuration files seem correct and there really are 200G free in
> /local/scratch on tars-XXX…
>
>
>
> I thank you in advance for any help.
>
>
>
> Regards,
>
>
>
>
>
> Véronique
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Véronique Legrand
>
> IT engineer – scientific calculation & software development
>
> https://research.pasteur.fr/en/member/veronique-legrand/
>
> Cluster and computing group
>
> IT department
>
> Institut Pasteur Paris
>
> Tel : 95 03
>
>
>