[slurm-users] Re: Unexpected node got allocation

2025-01-09 Thread Dan Healy via slurm-users
No, sadly there’s no topology.conf in use. Thanks, Daniel Healy On Thu, Jan 9, 2025 at 8:28 AM Steffen Grunewald < steffen.grunew...@aei.mpg.de> wrote: > On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote: > > Hello there and good morning from Baltimore. > > > > I have a small cluster wit

[slurm-users] Unexpected node got allocation

2025-01-09 Thread Dan Healy via slurm-users
Hello there and good morning from Baltimore. I have a small cluster with 100 nodes. When the cluster is completely empty of all jobs, the first job gets allocated to node 41. In other clusters, the first job gets allocated to mode 01. If I specify node 01, the allocation works perfectly. I have my

[slurm-users] Re: getting slurm going

2024-12-08 Thread Dan Healy via slurm-users
sinfo srun hostname Thanks, Daniel Healy On Sun, Dec 8, 2024 at 2:30 PM Steven Jones via slurm-users < slurm-users@lists.schedmd.com> wrote: > What tests can I do to prove that slurm is talking to the nodes pls? > > > > > regards > > Steven > > -- > slurm-users mailing list -- slurm-users@list

[slurm-users] Can SLURM queue different jobs to start concurrently?

2024-07-08 Thread Dan Healy via slurm-users
Hi there, I've received a question from an end user, which I presume the answer is "No", but would like to ask the community first. Scenario: The user wants to create a series of jobs that all need to start at the same time. Example: there are 10 different executable applications which have varyi

[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-30 Thread Dan Healy via slurm-users
Following up on this in case anyone can provide some insight, please. On Thu, May 16, 2024 at 8:32 AM Dan Healy wrote: > Hi there, SLURM community, > > I swear I've done this before, but now it's failing on a new cluster I'm > deploying. We have 6 compute nodes with 64 cpu each (384 CPU total).

[slurm-users] Executing srun -n X where X is greater than total CPU in entire cluster

2024-05-16 Thread Dan Healy via slurm-users
Hi there, SLURM community, I swear I've done this before, but now it's failing on a new cluster I'm deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I run `srun -n 500 hostname`, the task gets queued since there's not 500 available CPU. Wasn't there an option that allows

[slurm-users] Convergence of Kube and Slurm?

2024-05-04 Thread Dan Healy via slurm-users
Bright Cluster Manager has some verbiage on their marketing site that they can manage a cluster running both Kubernetes and Slurm. Maybe I misunderstood it. But nevertheless, I am encountering groups more frequently that want to run a stack of containers that need private container networking. Wha

[slurm-users] Re: [ext] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-28 Thread Dan Healy via slurm-users
Are most of us using HAProxy or something else? On Wed, Feb 28, 2024 at 3:38 PM Brian Andrus via slurm-users < slurm-users@lists.schedmd.com> wrote: > Magnus, > > That is a feature of the load balancer. Most of them have that these days. > > Brian Andrus > > On 2/28/2024 12:10 AM, Hagdorn, Magnus

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Dan Healy via slurm-users
connect, even if at the scale of most on-prem work, > you might be hard-pressed in real-world conditions to notice much of a > difference. If you're running jobs that take weeks and hundreds of nodes, > the time (and other) savings may add up, but if we're talking the >

[slurm-users] Question about IB and Ethernet networks

2024-02-25 Thread Dan Healy via slurm-users
Hi Fellow Slurm Users, This question is not slurm-specific, but it might develop into that. My question relates to understanding how *typical* HPCs are designed in terms of networking. To start, is it typical for there to be a high speed Ethernet *and* Infiniband networks (meaning separate switch