Hi Jeffrey,
Jeffrey T Frey writes:
> On a cluster running Slurm 17.11.8 (cons_res) I can submit a job that
> requests e.g. 2 nodes with unique features on each:
>
> $ sbatch --nodes=2 --ntasks-per-node=1 --constraint="[256GB*1&192GB*1]" …
>
> The job is submitted and runs as expected: on 1 nod
On 16/12/20 6:21 pm, Kevin Buckley wrote:
The skip is occuring, in src/lua/slurm_lua.c, because of this trap
That looks right to me, that's Doug's code which is checking whether the
file has been updated since slurmctld last read it in. If it has then
it'll reload it, but if it hasn't then
Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen
anything like this.
As the slurmctld restarts, after upping the debug level, it all look hunky dory,
[2020-12-17T09:23:46.204] debug3: Trying to load plugin
/opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so
[2020-
Thanks you Michael!
I've tried the following example:
NodeName=gpunode01 Gres=gpu:1 Sockets=2 CoresPerSocket=28
ThreadsPerCore=2 State=UNKNOWN RealMemory=38
PartitionName=gpu MaxCPUsPerNode=56 MaxMemPerNode=19
Nodes=gpunode01 Default=NO MaxTime=1-0 State=UP
PartitionName=cp
Hi,
I would like to do the equivalent of:
sacctmgr -i add user namef account=grpa
sacctmgr -i add user nameg account=grpa
...
sacctmgr -i add user namez account=grpa
but with an "sacct -i load filename" in which filename contains the grpa
with the list of user. The documentation mentions the "lo
We have overlapping partitions for GPU work and some kinds non-GPU work (both
large memory and regular memory jobs).
For 28-core nodes with 2 GPUs, we have:
PartitionName=gpu MaxCPUsPerNode=16 … Nodes=gpunode[001-004]
PartitionName=any-interactive MaxCPUsPerNode=12 …
Nodes=node[001-040],gpunode
Hi,
Say if I have a Slurm node with 1 x GPU and 112 x CPU cores, and:
1) there is a job running on the node using the GPU and 20 x CPU cores
2) there is a job waiting in the queue asking for 1 x GPU and 20 x
CPU cores
Is it possible to a) let a new job asking for 0 x GPU and 20 x CPU
On a cluster running Slurm 17.11.8 (cons_res) I can submit a job that requests
e.g. 2 nodes with unique features on each:
$ sbatch --nodes=2 --ntasks-per-node=1 --constraint="[256GB*1&192GB*1]" …
The job is submitted and runs as expected: on 1 node with feature "256GB" and
1 node with featur
You can use the -o option to select which field you want it to print.
The last column is the FairShare score. The equation is part of the
slurm documentation: https://slurm.schedmd.com/priority_multifactor.html
If you are using the Classic Fairshare you can look at our
documentation: https:
$ sshare -a
Account User RawShares NormSharesRawUsage
EffectvUsage FairShare
-- -- --- ---
- --
root 0.00 158 1.00
root
We do this here using the job_submit.lua script. Here is an example:
if part == "bigmem" then
if (job_desc.pn_min_memory ~= 0) then
if (job_desc.pn_min_memory < 19 or
job_desc.pn_min_memory > 2147483646) then
I just found an error in my attempt. I ran on saga-test02 while I'd made the
change to saga-test01. Things are working better now.
Thanks,
Erik
From: Erik Bryer
Sent: Wednesday, December 16, 2020 8:51 AM
To: Slurm User Community List
Subject: Re: [slurm-users] gr
Hello
Good afternoon, i have a query currently in our cluster we have different
partitions:
1 partition called slims with 48 Gb of ram
1 partition called general 192 Gb of ram
1 partition called largemem with 768 Gb of ram.
Is it possible to restrict access to the largemem partition and for tas
Hi Loris,
That actually makes some sense. There is one thing that troubles me though. If,
on a VM with no GPUs, I define...
NodeName=saga-test01 CPUS=2 SocketsPerBoard=1 CoresPerSocket=2 ThreadsPerCore=1
RealMemory=1800 State=UNKNOWN Gres=gpu:gtx1080ti:4
...and try to run the following I get a
Hi Olaf,
Since you are testing Slurm, perhape my Slurm Wiki page may be of interest
to you:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
There is a discussion about the setup of Munge.
Best regards,
Ole
On 12/15/20 5:48 PM, Olaf Gellert wrote:
Hi all,
we are setting up a new test
Hi Olaf,
Check the firewalls between your compute node and the Slurm controller to
make sure that they can contact each other. Slurmctld needs to contact the
SlurmdPort (default 6818), and slurmd needs to contact the SlurmctldPort
(default 6817). Also the other compute nodes need to be able to con
Hi,
Surfing during days on the net and seeking talks/tutos on schedmd website, I
didn’t really find a tuto (that works on a systemd env) how to install,
configure and deploy a slurm system on a single compute server with many cores
and many memory. Explanations and tutos in administration I hav
On 15/12/2020 17:48, Olaf Gellert wrote:
So munge seems to work as far as I can say. What else does
slurm using munge? Are hostnames part of the authentication?
Do I have to wonder about the time "Thu Jan 01 01:00:00 1970"
I'm not an expert but I know that hostnames are part of munge
authentic
18 matches
Mail list logo