Hi Miguel,
this is because SchedMD changed the stats field. There exists no more
rss_max, cmp. line 225 of seff.
You need to evaluate the field stats{tres_usage_in_max}, and there the
value after '2=', but this is the memory value in bytes instead of
kbytes, so this should be divided by 1024 additionally.
Best
Marcus
On 11/08/2018 11:06 AM, Miguel A. Sánchez wrote:
Hi and thanks for all your answers and sorry for the delay in my
answer. Yesterday I have installed in the controller machine the
Slurm-18.08.3 to check if with this last release the Seff command is
working fine. The behavior has improve but I still receive a error
message:
# /usr/local/slurm-18.08.3/bin/seff 1694112
*Use of uninitialized value $lmem in numeric lt (<) at
/usr/local/slurm-18.08.3/bin/seff line 130, <DATA> line 624.*
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 3.91 GB (3.91 GB/node)
[root@hydra ~]#
And due to this problem, any job shows me as memory utilized the
value of 0.00 MB.
With slurm-17.11.1 is working fine:
# /usr/local/slurm-17.11.0/bin/seff 1694112
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 2.44 GB
Memory Efficiency: 62.57% of 3.91 GB
[root@hydra bin]#
Miguel A. Sánchez Gómez
System Administrator
Research Programme on Biomedical Informatics - GRIB (IMIM-UPF)
Barcelona Biomedical Research Park (office 4.80)
Doctor Aiguader 88 | 08003 Barcelona (Spain)
Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550
e-mail:miguelangel.sanc...@upf.edu
On 11/06/2018 06:30 PM, Mike Cammilleri wrote:
Thanks for this. We'll try the workaround script. It is not
mission-critical but our users have gotten accustomed to seeing these
metrics at the end of each run and its nice to have. We are currently
doing this in a test VM environment, so by the time we actually do
the upgrade to the cluster perhaps the fix will be available then.
Mike Cammilleri
Systems Administrator
Department of Statistics | UW-Madison
1300 University Ave | Room 1280
608-263-6673 | mi...@stat.wisc.edu
------------------------------------------------------------------------
*From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf
of Chris Samuel <ch...@csamuel.org>
*Sent:* Tuesday, November 6, 2018 5:03 AM
*To:* slurm-users@lists.schedmd.com
*Subject:* Re: [slurm-users] Seff error with Slurm-18.08.1
On 6/11/18 7:49 pm, Baker D.J. wrote:
> The good new is that I am assured by SchedMD that the bug has been
fixed
> in v18.08.3.
Looks like it's fixed in this commmit.
commit 3d85c8f9240542d9e6dfb727244e75e449430aac
Author: Danny Auble <d...@schedmd.com>
Date: Wed Oct 24 14:10:12 2018 -0600
Handle symbol resolution errors in the 18.08 slurmdbd.
Caused by b1ff43429f6426c when moving the slurmdbd agent internals.
Bug 5882.
> Having said that we will probably live with this issue
> rather than disrupt users with another upgrade so soon .
An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though,
should it? We just flip a symlink and the users see the new binaries,
libraries, etc immediately, we can then restart daemons as and when we
need to (in the right order of course, slurmdbd, slurmctld and then
slurmd's).
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
--
Marcus Wagner, Dipl.-Inf.
IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wag...@itc.rwth-aachen.de
www.itc.rwth-aachen.de