Hello,
Having another strange problem with slurm 17.02.6.
I have a cluster with 250 cpus.
I am sending a testing job that only sleep for 60 seconds.
A lot of the jobs are taking more than 7 or 8 minute until they finish running
(I can see them in RUNNING mode for more the 7 minutes).
Is there a re
Oh hello Linh, fancy meeting you here! :-)
On 15/11/17 14:39, Linh Vu wrote:
> Those mlx5_* devices are in /sys/class/infiniband and
> /sys/class/infiniband_cm
Ah cool!
> Not sure if Slurm likes it though 😊
All it will do is set $OMPI_MCA_btl_openib_if_include to be the
interface closest to th
Those mlx5_* devices are in /sys/class/infiniband and /sys/class/infiniband_cm
Not sure if Slurm likes it though 😊
From: slurm-users on behalf of
Christopher Samuel
Sent: Wednesday, 15 November 2017 2:03:41 PM
To: slurm-users@lists.schedmd.com
Subject: Re: [sl
On 14/11/17 22:12, Geert Geurts wrote:
> I have no experience with gres nic config, but can't you use /sys/class/net
> instead of /dev?
Unfortunately not, that lists the p1p1 and p1p2 devices but not the
mlx5_0 and mlx5_1 names that Open-MPI needs to use. :-(
--
Christopher SamuelSeni
On Tue, 14 Nov 2017 14:58:00 +
Zohar Roe MLM wrote:
> Hello,
> Trying again with the slurm.conf This time.
>
> I have a cluster name: Autobot
> In this cluster I have servers:
> Optimus[1-10] and
> Megatron[1-10].
>
> I sent 3000 jobs with feature Optimus and part are running while part
> a
All,
I went to the SchedMD booth last night and talked with the guys. Tim told me
that the Barcelona Supercomputing Center is working on something similar. I am
going to try to meet with their Slurm person and compare notes.
I'm also going to look into trying InfluxDB instead of Graphite at t
Hi Roy,
What command are you using to start the jobs?
On 11/14/2017 09:58 AM, Zohar Roe MLM wrote:
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and
Hello,
Trying again with the slurm.conf This time.
I have a cluster name: Autobot
In this cluster I have servers:
Optimus[1-10] and
Megatron[1-10].
I sent 3000 jobs with feature Optimus and part are running while part are
pendind. Which is ok.
But I have sent 1000 jobs to Megatron and they are a
Hi there,
Le 13/11/2017 à 18:18, Nicholas McCollum a écrit :
Now that there is a slurm-users mailing list, I thought I would share
something with the community that I have been working on to see if anyone else
is interested in it. I have a lot of students on my cluster and I really
wanted a way
On 14/11/17 10:58, Chris Samuel wrote:
Yup, certainly interest here!
Ditto.
--
Simon Flood
HPC System Administrator
University of Cambridge Information Services
United Kingdom
Hi Chris,
I have no experience with gres nic config, but can't you use /sys/class/net
instead of /dev?
I think there is also a link to the pci device..
Regards,
Geert
From: Chris Samuel
Sent: Nov 13, 2017 8:40 AM
To: slurm-us...@schedmd.com
Subject: [slurm-user
Agree with Chris. Please do share the code with the community.
On 14/11/17 11:58, Chris Samuel wrote:
On Tuesday, 14 November 2017 4:18:08 AM AEDT Nicholas McCollum wrote:
If there's interest I would be more than happy to polish the code a little
and share it on github.
Yup, certainly intere
On Tuesday, 14 November 2017 4:18:08 AM AEDT Nicholas McCollum wrote:
> If there's interest I would be more than happy to polish the code a little
> and share it on github.
Yup, certainly interest here!
--
Christopher SamuelSenior Systems Administrator
Melbourne Bioinformatics - The U
13 matches
Mail list logo