Hi,
We have 16 nodes cluster with DGX-A100 (80 GB).
We have 128 cores of each node separated in to a separate partition for cpu
only jobs and 8 GPUs and 128 cores in other partitions for cpugpu jobs.
We want to ensure that only selected 128 cores should be part of the cpu
partition. (NUMA / Symm
Hi,
We are using slurm 21.08. We are curious to know how to use "sbank" utility
for crediting GPU Hours , just like cpu minutes, and also get the status of
GPUHours credited, used etc.
Actually, sbank utility from github is not having functionality of adding /
querying the GPUHours
Any other mean
Hi,
Do I need separate slurmctld and slurmd to run for this? I am struggling
for this. Any pointers.
--
Purvesh
On Mon, 26 Jun 2023 at 12:15, Purvesh Parmar wrote:
> Hi,
>
> I have slurm 20.11 in a cluster of 4 nodes, with each node having 16 cpus.
> I want to create two parti
Hi,
I have slurm 20.11 in a cluster of 4 nodes, with each node having 16 cpus.
I want to create two partitions (ppart and cpart) and want that 8 cores
from each of the 4 nodes should be part of part of ppart and remaining 8
cores should be part of cpart, this means, I want to distribute each
node'
Hi,
We have slurm-21.04 and have 8 nodes in job_submit.lua partition (wtih 2
gpus per node). I want to calculate utilization of only specific 4 nodes
out of 8 from the partition over the period of last 15 dayw , how to do it?
Regards,
Purvesh P
Thank you.. will try this and get back. Any other step being missed here
for migration?
Thankyou,
Purvesh
On Mon, 24 Apr 2023 at 12:08, Ole Holm Nielsen
wrote:
> On 4/24/23 08:09, Purvesh Parmar wrote:
> > thank you, however, because this is change in the data center, the names
&
at 11:25, Ole Holm Nielsen
wrote:
> On 4/24/23 06:58, Purvesh Parmar wrote:
> > thank you, but its change of hostnames as well, apart from ip addresses
> > as well of the slurm server, database serverver name and slurmd compute
> > nodes as well.
>
> I suggest that
ut I think that
> updates itself.
>
> The names of the servers are in slurm.conf, but again, if the names don’t
> change, that won’t matter. If you have IPs there, you will need to change
> them.
>
> Sent from my iPhone
>
> > On Apr 23, 2023, at 14:01, Purvesh Parmar
> wro
and slurmd on compute nodes
Please help and guide for above.
Regards,
Purvesh Parmar
INHAIT
;
> Sent from my T-Mobile 4G LTE Device
>
>
>
> Original message
> From: Purvesh Parmar
> Date: 3/13/23 7:05 PM (GMT-08:00)
> To: Slurm User Community List
> Subject: Re: [slurm-users] changing the operational network in slurm setup
>
> CAUTION: This em
to use the 10GB interface?
>
>
> -Original Message-----
> From: Purvesh Parmar
> Reply-To: Slurm User Community List
> To: Slurm User Community List
> Subject: [slurm-users] changing the operational network in slurm setup
> Date: 03/13/2023 06:19:13 PM
>
> CA
hi,
We have slurm 22.08 running on ethernet (1 GbE) network (slurmdbd,
slurmctld and slurmd on compute nodes) on ubuntu 20.04. We want to migrate
the slurm services on the 10 gbe network, which is present on all the nodes
and on the master server as well. How to proceed for this?
Thanks,
P. Parm
We require more nodes to run a single job which requires more nodes than
present in HMEM partition. We have other partition XEON . Can a user run a
single job across both partitions? We are using slurm 21
Thanks & Regards,
Purvesh
-test
I have restarted slurmctld on master and slurmd on the nodes. Then I have
tested jobs, but nothing executes after the job is over.
Please help
Regards,
Purvesh
On Sat, 16 Jul 2022 at 12:37, Purvesh Parmar wrote:
> Hi,
>
> I have written a shell script with name epilog-test. I h
Hi,
I have written a shell script with name epilog-test. I have mentioned in
the slurm.conf file :
Epilog=/var/slurm/etc/epilog-test
The same slurm.conf file has been copied on all the nodes.
My epilog-test is
#! /bin/bash
echo "epilog test" > /tmp/testfile
Chmod +x epilog-test
I have restarte
, Ole Holm Nielsen
wrote:
> Hi Purvesh,
>
> On 7/11/22 03:37, Purvesh Parmar wrote:
> > I want to limit the queued jobs per user to 5 which means, system should
> > not allow more than 5 jobs per user to remain in queue (not running and
> > waiting for resources) and only
Hi,
I want to limit the queued jobs per user to 5 which means, system should
not allow more than 5 jobs per user to remain in queue (not running and
waiting for resources) and only 4 jobs to run at any given time. To
summarize, I want to implement a policy of 4 jobs per user in the running
state a
allocated gpu hours and also
does not show for a week duration.
sreport reservation utilization name=rizwan_res start=2022-03-28T10:00:00
end=2022-04-03T10:00:00
Please help.
Regards,
Purvesh
On Sat, 30 Apr 2022 at 15:57, Purvesh Parmar wrote:
> Hello,
>
> We have a node given to a g
Hello,
We have a node given to a group that has 2 GPUs in dedicated mode by
setting reservation for 6 months. We want to find out GPU hours utilization
weekly utilization of that particular reserved node. The node is not in to
seperate partition.
Below command does not help in showing the allocate
Hello,
I am using slurm 21.08. I am stuck with the following.
Q1 : I have 8 nodes with 2 gpus each and 128 cores with 512 GB RAM. I want
to distribute each node's resources in 2 partitions so that "par1"
partition will have 2 gpus with 64 cores and 256 GB ram of the node and
the other partition
Hello,
I have been using slurm 21.08.
Q1 : I have 8 nodes with 2 gpus each and 128 cores with 512 GB RAM. I want
to distribute the node resources in 2 partitions so that "par1" partition
will have 2 gpus with 64 cores and 256 GB ram of the node and the other
partition "par 2" will have the remain
21 matches
Mail list logo