I'm having a hard time figuring out the distribution of jobs between 2
clusters in a Slurm multi-cluster environment. The documentation says that
each job is submitted to the cluster that provides the earliest start time,
and once the task is submitted to a cluster, it can't be re-distributed to
an
I have no experience with this, but based on my understanding of the doc, the
shutdown command should be something like "ssh ${node} systemctl shutdown", and
the resume "ipmitool -I lan -H ${node}-bmc -U -f password_file.txt
chassis power on ".
If you use libvirt for your virtual cluster, you c
On 7/28/22 18:49, Djamil Lakhdar-Hamina wrote:
I am helping set up a 16 node cluster computing system, I am not a
system-admin but I work for a small firm and unfortunately have to pick
up needed skills fast in things I have little experience in. I am
running Rocky Linux 8 on Intel Xeon Knights
I am helping set up a 16 node cluster computing system, I am not a
system-admin but I work for a small firm and unfortunately have to pick up
needed skills fast in things I have little experience in. I am running
Rocky Linux 8 on Intel Xeon Knights Landings nodes donated by the TAAC
center. We are
Hello Slurm Users,
I am experimenting with the new --prefer soft constraint option in 22.05.
The option behaves as described, but is somewhat inefficient if many jobs
with different --prefer options are submitted. Here is the scenario:
1. submit array of 100 tasks preferring feature A, each task
Dear all,
I have copied the user file from windows and did not covert it using
dos2unix and using a shell script to add the user and account to the slurm
but I am facing the problem and the output of the sshare command as below-
[root@master01]# sshare -a
Account User RawShares
On Friday, 30 July 2021 11:21:19 AM PDT Soichi Hayashi wrote:
> I am running slurm-wlm 17.11.2
You are on a truly ancient version of Slurm there I'm afraid (there have been
4 major releases & over 13,000 commits since that was tagged in January 2018),
I would strongly recommend you try and get
Hello. I need a help with troubleshooting our slurm cluster.
I am running slurm-wlm 17.11.2 on Ubuntu 20 on a public cloud
infrastructure (Jetstream) using an elastic computing mechanism (
https://slurm.schedmd.com/elastic_computing.html). Our cluster works for
the most part, but for some reason,
Forgot the link to the Wiki: https://wiki.fysik.dtu.dk/niflheim/SLURM
On 12/8/19 9:18 PM, Ole Holm Nielsen wrote:
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think most
of your requirements and que
Hi Dean,
You may want to look at the links in my Slurm Wiki page. Both the
official Slurm documentation and other resources are listed. I think
most of your requirements and questions are described in these pages.
My Wiki gives detailed deployment information for a CentOS 7 cluster,
but mu
I'm doing my first slurm installation. The schedmd docs assume that I have
a cluster that meets certain (unstated) requirements available, but I
don't. I've found a couple of examples showing how to setup a cluster for
slurm using real hardware (nodes) with GPUs:
https://github.com/mknoxnv/ubu
Hi,
I'm adding a bunch of memory on two of our nodes that are part of a blade
chassis. So two computes will be upgraded to 1TB RAM and the rest have
192GB. All of the nodes belog to several partitons and can be used by our
paid members given the partition below. I'm looking for ways to figure out
Hey, folks. I have a relatively simple queueing setup on Slurm 17.02 with a
1000 CPU-day AssocGrpCPURunMinutesLimit set. When the cluster is less busy
than typical, I may still have users run up against the 1000 CPU-day limit,
even though some nodes are idle.
What’s the easiest way to force a job
13 matches
Mail list logo