[slurm-users] Re: "Optimal" slurm configuration

2024-02-26 Thread Gerhard Strangar via slurm-users
Max Grönke via slurm-users wrote: > (b) introduce a "small" partition for the <4h jobs with higher priority but > we're unsure if this will block all the larger jobs to run... Just limit the number of cpus in that partition. Gerhard -- slurm-users mailing list -- slurm-users@lists.schedmd.com

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Paul Edmon via slurm-users
I concur with what folks have written so far, it really depends on your use case. For instance if you are looking at a cluster with GPU's and intend to do some serious computing there you are going to need RDMA of some sort. But it all depends on what you end up needing for your workflows. For

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Dan Healy via slurm-users
I’m very appreciative for each person who’s provided some feedback, especially the lengthy replies. Sounds like RoCE capable Ethernet backbone may be the default way to go *unless* the end users have some specific requirements that might need IB. At this point, we wouldn’t be interested in anythin

[slurm-users] Re: canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-26 Thread Ward Poelmans via slurm-users
Hi, On 26/02/2024 09:27, Josef Dvoracek via slurm-users wrote: Are you anybody using something more advanced and still understandable by casual user of HPC? I'm not sure it qualifies but: sbatch --wrap 'screen -D -m' srun --jobid --pty screen -rd Or: sbatch -J screen --wrap 'screen -D -m'

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Cutts, Tim via slurm-users
My view is that it depends entirely on the workload, and the systems with which your compute needs to interact. A few things I’ve experienced before. 1. Modern ethernet networks have pretty good latency these days, and so MPI codes can run over them. Whether IB is worth the money is a cos

[slurm-users] canonical way to run longer shell/bash interactive job (instead of srun inside of screen/tmux at front-end)?

2024-02-26 Thread Josef Dvoracek via slurm-users
What is the recommended way to run longer interactive job at your systems? Our how-to includes starting screen at front-end node and running srun with bash/zsh inside, but that indeed brings dependency between login node (with screen) and the compute node job. On systems with multiple front-e

[slurm-users] Re: Question about IB and Ethernet networks

2024-02-26 Thread Josef Dvoracek via slurm-users
> Just looking for some feedback, please. Is this OK? Is there a better way? > I’m tempted to spec all new HPCs with only a high speed (200Gbps) IB network, Well you need Ethernet for OOB management (bmc/ipmi/ilo/whatever) anyway.. or? cheers josef On 25. 02. 24 21:12, Dan Healy via slur