Hi and thanks for all your answers and sorry for the delay in my answer. Yesterday I have installed in the controller machine the Slurm-18.08.3 to check if with this last release the Seff command is working fine. The behavior has improve but I still receive a error message:
# /usr/local/slurm-18.08.3/bin/seff 1694112 *Use of uninitialized value $lmem in numeric lt (<) at /usr/local/slurm-18.08.3/bin/seff line 130, <DATA> line 624.* Job ID: 1694112 Cluster: XXXXX User/Group: XXXXX State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 2 CPU Utilized: 01:39:33 CPU Efficiency: 4266.43% of 00:02:20 core-walltime Job Wall-clock time: 00:01:10 Memory Utilized: 0.00 MB (estimated maximum) Memory Efficiency: 0.00% of 3.91 GB (3.91 GB/node) [root@hydra ~]# And due to this problem, any job shows me as memory utilized the value of 0.00 MB. With slurm-17.11.1 is working fine: # /usr/local/slurm-17.11.0/bin/seff 1694112 Job ID: 1694112 Cluster: XXXXX User/Group: XXXXX State: COMPLETED (exit code 0) Nodes: 1 Cores per node: 2 CPU Utilized: 01:39:33 CPU Efficiency: 4266.43% of 00:02:20 core-walltime Job Wall-clock time: 00:01:10 Memory Utilized: 2.44 GB Memory Efficiency: 62.57% of 3.91 GB [root@hydra bin]# Miguel A. Sánchez Gómez System Administrator Research Programme on Biomedical Informatics - GRIB (IMIM-UPF) Barcelona Biomedical Research Park (office 4.80) Doctor Aiguader 88 | 08003 Barcelona (Spain) Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550 e-mail: miguelangel.sanc...@upf.edu On 11/06/2018 06:30 PM, Mike Cammilleri wrote: > > Thanks for this. We'll try the workaround script. It is not > mission-critical but our users have gotten accustomed to seeing these > metrics at the end of each run and its nice to have. We are currently > doing this in a test VM environment, so by the time we actually do the > upgrade to the cluster perhaps the fix will be available then. > > > Mike Cammilleri > > Systems Administrator > > Department of Statistics | UW-Madison > > 1300 University Ave | Room 1280 > 608-263-6673 | mi...@stat.wisc.edu > > > > ------------------------------------------------------------------------ > *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on behalf > of Chris Samuel <ch...@csamuel.org> > *Sent:* Tuesday, November 6, 2018 5:03 AM > *To:* slurm-users@lists.schedmd.com > *Subject:* Re: [slurm-users] Seff error with Slurm-18.08.1 > > On 6/11/18 7:49 pm, Baker D.J. wrote: > > > The good new is that I am assured by SchedMD that the bug has been > fixed > > in v18.08.3. > > Looks like it's fixed in this commmit. > > commit 3d85c8f9240542d9e6dfb727244e75e449430aac > Author: Danny Auble <d...@schedmd.com> > Date: Wed Oct 24 14:10:12 2018 -0600 > > Handle symbol resolution errors in the 18.08 slurmdbd. > > Caused by b1ff43429f6426c when moving the slurmdbd agent internals. > > Bug 5882. > > > > Having said that we will probably live with this issue > > rather than disrupt users with another upgrade so soon . > > An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though, > should it? We just flip a symlink and the users see the new binaries, > libraries, etc immediately, we can then restart daemons as and when we > need to (in the right order of course, slurmdbd, slurmctld and then > slurmd's). > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC >