Oh, thanks Paddy for your patch, it works very well !! Miguel A. Sánchez Gómez System Administrator Research Programme on Biomedical Informatics - GRIB (IMIM-UPF)
Barcelona Biomedical Research Park (office 4.80) Doctor Aiguader 88 | 08003 Barcelona (Spain) Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550 e-mail: miguelangel.sanc...@upf.edu On 11/09/2018 07:59 AM, Marcus Wagner wrote: > Thanks Paddy, > > just something learned again ;) > > > Best > Marcus > > On 11/08/2018 05:07 PM, Paddy Doyle wrote: >> Hi all, >> >> It looks like we can use the api to avoid having to manually parse >> the '2=' >> value from the stats{tres_usage_in_max} value. >> >> I've submitted a bug report and patch: >> >> https://bugs.schedmd.com/show_bug.cgi?id=6004 >> >> The minimal changes needed would be in the attched seff.patch. >> >> Hope that helps, >> >> Paddy >> >> On Thu, Nov 08, 2018 at 11:54:59AM +0100, Marcus Wagner wrote: >> >>> Hi Miguel, >>> >>> >>> this is because SchedMD changed the stats field. There exists no more >>> rss_max, cmp. line 225 of seff. >>> You need to evaluate the field stats{tres_usage_in_max}, and there >>> the value >>> after '2=', but this is the memory value in bytes instead of kbytes, >>> so this >>> should be divided by 1024 additionally. >>> >>> >>> Best >>> Marcus >>> >>> On 11/08/2018 11:06 AM, Miguel A. Sánchez wrote: >>>> Hi and thanks for all your answers and sorry for the delay in my >>>> answer. >>>> Yesterday I have installed in the controller machine the Slurm-18.08.3 >>>> to check if with this last release the Seff command is working >>>> fine. The >>>> behavior has improve but I still receive a error message: >>>> >>>> >>>> # /usr/local/slurm-18.08.3/bin/seff 1694112 >>>> *Use of uninitialized value $lmem in numeric lt (<) at >>>> /usr/local/slurm-18.08.3/bin/seff line 130, <DATA> line 624.* >>>> Job ID: 1694112 >>>> Cluster: XXXXX >>>> User/Group: XXXXX >>>> State: COMPLETED (exit code 0) >>>> Nodes: 1 >>>> Cores per node: 2 >>>> CPU Utilized: 01:39:33 >>>> CPU Efficiency: 4266.43% of 00:02:20 core-walltime >>>> Job Wall-clock time: 00:01:10 >>>> Memory Utilized: 0.00 MB (estimated maximum) >>>> Memory Efficiency: 0.00% of 3.91 GB (3.91 GB/node) >>>> [root@hydra ~]# >>>> >>>> >>>> And due to this problem, any job shows me as memory utilized the >>>> value >>>> of 0.00 MB. >>>> >>>> >>>> With slurm-17.11.1 is working fine: >>>> >>>> >>>> # /usr/local/slurm-17.11.0/bin/seff 1694112 >>>> Job ID: 1694112 >>>> Cluster: XXXXX >>>> User/Group: XXXXX >>>> State: COMPLETED (exit code 0) >>>> Nodes: 1 >>>> Cores per node: 2 >>>> CPU Utilized: 01:39:33 >>>> CPU Efficiency: 4266.43% of 00:02:20 core-walltime >>>> Job Wall-clock time: 00:01:10 >>>> Memory Utilized: 2.44 GB >>>> Memory Efficiency: 62.57% of 3.91 GB >>>> [root@hydra bin]# >>>> >>>> >>>> >>>> >>>> Miguel A. Sánchez Gómez >>>> System Administrator >>>> Research Programme on Biomedical Informatics - GRIB (IMIM-UPF) >>>> >>>> Barcelona Biomedical Research Park (office 4.80) >>>> Doctor Aiguader 88 | 08003 Barcelona (Spain) >>>> Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550 >>>> e-mail:miguelangel.sanc...@upf.edu >>>> On 11/06/2018 06:30 PM, Mike Cammilleri wrote: >>>>> Thanks for this. We'll try the workaround script. It is not >>>>> mission-critical but our users have gotten accustomed to seeing >>>>> these metrics at the end of each run and its nice to have. We are >>>>> currently doing this in a test VM environment, so by the time we >>>>> actually do the upgrade to the cluster perhaps the fix will be >>>>> available then. >>>>> >>>>> >>>>> Mike Cammilleri >>>>> >>>>> Systems Administrator >>>>> >>>>> Department of Statistics | UW-Madison >>>>> >>>>> 1300 University Ave | Room 1280 >>>>> 608-263-6673 | mi...@stat.wisc.edu >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> *From:* slurm-users <slurm-users-boun...@lists.schedmd.com> on >>>>> behalf of Chris Samuel <ch...@csamuel.org> >>>>> *Sent:* Tuesday, November 6, 2018 5:03 AM >>>>> *To:* slurm-users@lists.schedmd.com >>>>> *Subject:* Re: [slurm-users] Seff error with Slurm-18.08.1 >>>>> On 6/11/18 7:49 pm, Baker D.J. wrote: >>>>> >>>>>> The good new is that I am assured by SchedMD that the bug has been >>>>> fixed >>>>>> in v18.08.3. >>>>> Looks like it's fixed in this commmit. >>>>> >>>>> commit 3d85c8f9240542d9e6dfb727244e75e449430aac >>>>> Author: Danny Auble <d...@schedmd.com> >>>>> Date: Wed Oct 24 14:10:12 2018 -0600 >>>>> >>>>> Handle symbol resolution errors in the 18.08 slurmdbd. >>>>> >>>>> Caused by b1ff43429f6426c when moving the slurmdbd agent >>>>> internals. >>>>> >>>>> Bug 5882. >>>>> >>>>> >>>>>> Having said that we will probably live with this issue >>>>>> rather than disrupt users with another upgrade so soon . >>>>> An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though, >>>>> should it? We just flip a symlink and the users see the new >>>>> binaries, >>>>> libraries, etc immediately, we can then restart daemons as and >>>>> when we >>>>> need to (in the right order of course, slurmdbd, slurmctld and then >>>>> slurmd's). >>>>> >>>>> All the best, >>>>> Chris >>>>> -- >>>>> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC >>>>> >>> -- >>> Marcus Wagner, Dipl.-Inf. >>> >>> IT Center >>> Abteilung: Systeme und Betrieb >>> RWTH Aachen University >>> Seffenter Weg 23 >>> 52074 Aachen >>> Tel: +49 241 80-24383 >>> Fax: +49 241 80-624383 >>> wag...@itc.rwth-aachen.de >>> www.itc.rwth-aachen.de >>> >