[slurm-users] Jobs take more time then what they need

2017-11-14 Thread Zohar Roe MLM
Hello, Having another strange problem with slurm 17.02.6. I have a cluster with 250 cpus. I am sending a testing job that only sleep for 60 seconds. A lot of the jobs are taking more than 7 or 8 minute until they finish running (I can see them in RUNNING mode for more the 7 minutes). Is there a re

Re: [slurm-users] NIC gres types and lack of device files?

2017-11-14 Thread Christopher Samuel
Oh hello Linh, fancy meeting you here! :-) On 15/11/17 14:39, Linh Vu wrote: > Those mlx5_* devices are in /sys/class/infiniband and > /sys/class/infiniband_cm Ah cool! > Not sure if Slurm likes it though 😊 All it will do is set $OMPI_MCA_btl_openib_if_include to be the interface closest to th

Re: [slurm-users] NIC gres types and lack of device files?

2017-11-14 Thread Linh Vu
Those mlx5_* devices are in /sys/class/infiniband and /sys/class/infiniband_cm Not sure if Slurm likes it though 😊 From: slurm-users on behalf of Christopher Samuel Sent: Wednesday, 15 November 2017 2:03:41 PM To: slurm-users@lists.schedmd.com Subject: Re: [sl

Re: [slurm-users] NIC gres types and lack of device files?

2017-11-14 Thread Christopher Samuel
On 14/11/17 22:12, Geert Geurts wrote: > I have no experience with gres nic config, but can't you use /sys/class/net > instead of /dev? Unfortunately not, that lists the p1p1 and p1p2 devices but not the mlx5_0 and mlx5_1 names that Open-MPI needs to use. :-( -- Christopher SamuelSeni

Re: [slurm-users] Priority wait

2017-11-14 Thread Peter Kjellström
On Tue, 14 Nov 2017 14:58:00 + Zohar Roe MLM wrote: > Hello, > Trying again with the slurm.conf This time. > > I have a cluster name: Autobot > In this cluster I have servers: > Optimus[1-10] and > Megatron[1-10]. > > I sent 3000 jobs with feature Optimus and part are running while part > a

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Nicholas McCollum
All, I went to the SchedMD booth last night and talked with the guys. Tim told me that the Barcelona Supercomputing Center is working on something similar. I am going to try to meet with their Slurm person and compare notes. I'm also going to look into trying InfluxDB instead of Graphite at t

Re: [slurm-users] Priority wait

2017-11-14 Thread Andy Riebs
Hi Roy, What command are you using to start the jobs? On 11/14/2017 09:58 AM, Zohar Roe MLM wrote: Hello, Trying again with the slurm.conf This time. I have a cluster name: Autobot In this cluster I have servers: Optimus[1-10] and Megatron[1-10]. I sent 3000 jobs with feature Optimus and

[slurm-users] Priority wait

2017-11-14 Thread Zohar Roe MLM
Hello, Trying again with the slurm.conf This time. I have a cluster name: Autobot In this cluster I have servers: Optimus[1-10] and Megatron[1-10]. I sent 3000 jobs with feature Optimus and part are running while part are pendind. Which is ok. But I have sent 1000 jobs to Megatron and they are a

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Rémi Palancher
Hi there, Le 13/11/2017 à 18:18, Nicholas McCollum a écrit : Now that there is a slurm-users mailing list, I thought I would share something with the community that I have been working on to see if anyone else is interested in it. I have a lot of students on my cluster and I really wanted a way

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Simon Flood
On 14/11/17 10:58, Chris Samuel wrote: Yup, certainly interest here! Ditto. -- Simon Flood HPC System Administrator University of Cambridge Information Services United Kingdom

Re: [slurm-users] NIC gres types and lack of device files?

2017-11-14 Thread Geert Geurts
Hi Chris, I have no experience with gres nic config, but can't you use /sys/class/net instead of /dev? I think there is also a link to the pci device.. Regards, Geert From: Chris Samuel Sent: Nov 13, 2017 8:40 AM To: slurm-us...@schedmd.com Subject: [slurm-user

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Rajiv Nishtala
Agree with Chris. Please do share the code with the community. On 14/11/17 11:58, Chris Samuel wrote: On Tuesday, 14 November 2017 4:18:08 AM AEDT Nicholas McCollum wrote: If there's interest I would be more than happy to polish the code a little and share it on github. Yup, certainly intere

Re: [slurm-users] Graphing job metrics

2017-11-14 Thread Chris Samuel
On Tuesday, 14 November 2017 4:18:08 AM AEDT Nicholas McCollum wrote: > If there's interest I would be more than happy to polish the code a little > and share it on github. Yup, certainly interest here! -- Christopher SamuelSenior Systems Administrator Melbourne Bioinformatics - The U