date:20200806

Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies

Hi Diego, Yes this can be tricky we also use this feature. The billing is on partition level. so you can set different schemas. We have nodes with 16 cores and 96GB of ram and this are the cheapest nodes they cost in our model. 1 SBU (System Billing Unit). For this node we have the following

Re: [slurm-users] Billing issue

2020-08-06 Thread Diego Zuccato

Il 06/08/20 10:00, Bas van der Vlies ha scritto: Tks for the answer. > We have nodes with 16 cores and 96GB of ram and this are the cheapest nodes > they > cost in our model. Theoretical 6GB/core. 5.625 net. > We multiple everything by 1000 to avoid slurm's behaviour of truncating the > result

Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies

Il 06/08/20 10:00, Bas van der Vlies ha scritto: Tks for the answer. >> We have also node with GPU's (dfiferent types) and some cost more the others. > The partitions always have the same type of nodes not mixed,eg: > * > TRESBillingWeights=CPU=3801.0,Mem=502246.0T,GRES/gpu=22807.0,GRES/gpu:t

Re: [slurm-users] Billing issue

2020-08-06 Thread Diego Zuccato

Il 06/08/20 12:46, Bas van der Vlies ha scritto: > we have MAX(core, mem, gres). all resources can have the score: 91228 Ah, Ok. So you have PriorityFlags=MAX_TRES too. > So we take one of these maximum values we dived it again by 1000 and round > it. Hopefully this explains it. Yup, tks. Now I

Re: [slurm-users] Billing issue

2020-08-06 Thread Paul Raines

Bas Does that mean you are setting PriorityFlags=MAX_TRES ? Also does anyone understand this from the slurm.conf docs: The weighted amount of a resource can be adjusted by adding a suffix of K,M,G,T or P after the billing weight. For example, a memory weight of "mem=.25" on a job allocat

Re: [slurm-users] Billing issue

2020-08-06 Thread Bas van der Vlies

On Thu, 2020-08-06 at 09:30 -0400, Paul Raines wrote: > Bas > > Does that mean you are setting PriorityFlags=MAX_TRES ? > YES > Also does anyone understand this from the slurm.conf docs: > >The weighted amount of a resource can be adjusted by adding a suffix of >K,M,G,T or P after the b

Re: [slurm-users] Debugging communication problems

2020-08-06 Thread Gerhard Strangar

Gerhard Strangar wrote: > I'm experiencing a connectivity problem and I'm out of ideas, why this > is happening. I'm running a slurmctld on a multihomed host. > > (10.9.8.0/8) - master - (10.11.12.0/8) > There is no routing between these two subnets. My topology.conf contained a loop, which resu

[slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Jason Simms

Hello all, Later this month, I will have to bring down, patch, and reboot all nodes in our cluster for maintenance. The two options available to set nodes into a maintenance mode seem to be either: 1) creating a system-wide reservation, or 2) setting all nodes into a DRAIN state. I'm not sure it

Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Paul Edmon

Because we want to maximize usage we actually have opted to just cancel all running jobs the day of. We send out notification to all the users that this will happen. We haven't really seen any complaints and we've been doing this for years. At the start of the outage we set all partitions to

Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Ole Holm Nielsen

On 06-08-2020 19:13, Jason Simms wrote: Later this month, I will have to bring down, patch, and reboot all nodes in our cluster for maintenance. The two options available to set nodes into a maintenance mode seem to be either: 1) creating a system-wide reservation, or 2) setting all nodes into

Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Ing. Gonzalo E. Arroyo

When I need to do something like this I let the automatic SLURM management to do the job. I only shutdown by using SSH, replace something, then power on and everything starts Ok, other option is to call resume in case of any failure, and restart the slurm services in nodes... Regards *Ing. Gonzalo

Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Thomas M. Payerle

We usually we set up a reservation for maintenance. This prevents jobs from starting if they are not expected to end before the reservation (maintenance) starts. As Paul indicated, this causes nodes to become idle (and pending job queue to grow) as maintenance time approaches, but avoids requiring

[slurm-users] Compute node OS and firmware updates

2020-08-06 Thread Ole Holm Nielsen

Regarding the question of methods for Slurm compute node OS and firmware updates, we have for a long time used rolling updates while the cluster is in full production, so that we do not waste any resources. When entire partitions are upgraded in this way, there is no risk of starting new jobs

Re: [slurm-users] Reservation vs. Draining for Maintenance?

2020-08-06 Thread Christopher Samuel

On 8/6/20 10:13 am, Jason Simms wrote: Later this month, I will have to bring down, patch, and reboot all nodes in our cluster for maintenance. The two options available to set nodes into a maintenance mode seem to be either: 1) creating a system-wide reservation, or 2) setting all nodes into

[slurm-users] Tuning MaxJobs and MaxJobsSubmit per user and for the whole cluster?

2020-08-06 Thread Hoyle, Alan P

I can't find any advice online about how to tune things like MaxJobs on a per-cluster or per-user basis. As far as I can tell, it seems that the default install cluster MaxJobs seems to be 10,000 and MaxSubmit as the same. Those seem pretty low to me: are there resources that get consumed if

Re: [slurm-users] Slurmstepd errors

2020-08-06 Thread Williams, Jenny Avis

We ran into a similar error -- A response from schedmd: https://bugs.schedmd.com/show_bug.cgi?id=3890 Remediating steps until updates got us past this particular issue: Check for "xcgroup_instantiate errors” and close nodes that show this in messages log. From the nodes listed here we close com

Re: [slurm-users] Correct way to give srun and sbatch different MaxTime values?

2020-08-06 Thread Jaekyeom Kim

Thank you for the answer. I wasn't aware of that file. I'll look into it! Best, Jaekyeom On Wed, Aug 5, 2020 at 3:27 AM Renfro, Michael wrote: > Untested, but you should be able to use a job_submit.lua file to detect if > the job was started with srun or sbatch: > >- Check with (job_desc.s

Re: [slurm-users] Billing issue

Re: [slurm-users] Billing issue

Re: [slurm-users] Billing issue

Re: [slurm-users] Billing issue

Re: [slurm-users] Billing issue

Re: [slurm-users] Billing issue

Re: [slurm-users] Debugging communication problems

[slurm-users] Reservation vs. Draining for Maintenance?

Re: [slurm-users] Reservation vs. Draining for Maintenance?

Re: [slurm-users] Reservation vs. Draining for Maintenance?

Re: [slurm-users] Reservation vs. Draining for Maintenance?

Re: [slurm-users] Reservation vs. Draining for Maintenance?

[slurm-users] Compute node OS and firmware updates

Re: [slurm-users] Reservation vs. Draining for Maintenance?

[slurm-users] Tuning MaxJobs and MaxJobsSubmit per user and for the whole cluster?

Re: [slurm-users] Slurmstepd errors

Re: [slurm-users] Correct way to give srun and sbatch different MaxTime values?

17 matches

Site Navigation

Mail list logo

Footer information