Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain
in two weeks from now: Has anyone heard information about hotels and
the schedule? The official page
https://slurm.schedmd.com/slurm_ug_agenda.html was last updated on May 30...
/Ole
On 9/8/18 5:11 AM, John Hearns wrote:
Not an answer to your question - a good diagnostic for cgroups is the
utility 'lscgroups'
Where does one find this utility?
On Monday, 10 September 2018 9:39:28 PM AEST Patrick Goetz wrote:
> On 9/8/18 5:11 AM, John Hearns wrote:
>
> > Not an answer to your question - a good diagnostic for cgroups is the
> > utility 'lscgroups'
>
> Where does one find this utility?
It's in the libcgroup-tools package in RHEL/CentOS a
On Monday, 10 September 2018 4:42:00 PM AEST Janne Blomqvist wrote:
> One workaround is to reboot the node whenever this happens. Another is
> to set ConstrainKmemSpace=no is cgroup.conf (but AFAICS this option was
> added in slurm 17.02 and is not present in 16.05 that you're using).
Phew, we h
No this happens without the "Oversubscribe" parameter being set. I'm using
custom resources though:
GresTypes=some_resource
NodeName=compute-[1-100] CPUs=10 Gres=some_resource:10 State=CLOUD
Submission uses:
sbatch --nodes=1 --ntasks-per-node=1 --gres=some_resource:1
But I just tried it withou
I think you probably want CR_LLN set in your SelectTypeParameters in
slurm.conf. This makes it fill up a node before moving on to the next
instead of "striping" the jobs across the nodes.
On Mon, Sep 10, 2018 at 8:29 AM Felix Wolfheimer
wrote:
>
> No this happens without the "Oversubscribe" parame
Thanks everyone for your responses. It looks like the two suggestions were:
1. add "cgroup_enable=memory swapaccount=1" to the kernel command by adding it
to /etc/default/grub in the GRUB_CMDLIND_LINUX variable
2. Add ConstrainKmemSpace=no in cgroup.conf
>From this information I think option 2
I believe the default value of this would prevent jobs from sharing a node.
You may want to look at this and change it from the default.
--
Brian D. Haymore
University of Utah
Center for High Performance Computing
155 South 1452 East RM 405
Salt Lake City, Ut 84112
Phone: 801-558-1150, Fax: 801-
Ole,
You can find hotels close to CIEMAT here
https://drive.google.com/open?id=1eEKgnlBXeYNO426QS7nPuDS4nm8aUpnH&usp=sharing
Jacob
On Mon, Sep 10, 2018 at 1:23 AM, Ole Holm Nielsen <
ole.h.niel...@fysik.dtu.dk> wrote:
> Regarding the Slurm User Group Meeting 2018 coming up in Madrid, Spain in
>
Just an update: the cgroup.conf file could not be parsed when I added
ConstrainKmemSpace=no. I guess this option is not compatible with our
kernel/slurm versions on Ubuntu? Not sure. For now we took the lazy way out and
rebooted nodes. Will try the kernel options or a full slurm update as time
Hi Jacob,
Thanks for the info. Is someone going to compile a travel and hotel
information sheet?
CIEMAT seems to have an agreement with some hotels. All hotels seem to
be located 2-3 km from CIEMAT, so perhaps there's a local bus line to
take into consideration when booking a hotel?
Than
Hi All,
We have installed slurm 17.11.8 on IBM AC922 nodes (POWER9) that have 4
GPUs each, and are running RHEL 7.5-ALT. Physically, these are 2-socket
nodes, with each socket having 20 cores. Depending on SMT setting (SMT1,
SMT2, SMT4) there can be 40, 80, or 160 "processors/CPUs" virtually.
Som
Hi Keith,
On Tuesday, 11 September 2018 7:46:14 AM AEST Keith Ball wrote:
> 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on
> these systems.
[...]
> Anyone know if there is a way to get Slurm to recognize the true topology
> for POWER nodes?
IIIRC Slurm uses hwloc for dis
On Tuesday, 11 September 2018 2:05:51 AM AEST Mike Cammilleri wrote:
> Just an update: the cgroup.conf file could not be parsed when I added
> ConstrainKmemSpace=no. I guess this option is not compatible with our
> kernel/slurm versions on Ubuntu? Not sure.
I think that'll just be your version of
On Tuesday, 11 September 2018 12:52:27 AM AEST Brian Haymore wrote:
> I believe the default value of this would prevent jobs from sharing a node.
But the jobs _do_ share a node when the resources become available, it's just
that the cloud part of Slurm is bringing up the wrong number of nodes c
I re-read the docs and I was wrong on the default behavior. The default is
"no" which just means don't oversubcribe the individual resources where I
thought it was default to 'exclusive'. So I think I've been taking us down a
dead end in terms of what I thought might help. :\
I have a system
Just a quick note to mention that the SLUG'18 agenda has been posted online:
https://slurm.schedmd.com/slurm_ug_agenda.html
17 matches
Mail list logo