[slurm-users] RES: multiple srun commands in the same SLURM script

2023-10-31 Thread Paulo Jose Braga Estrela
Hi, I think that you have a syntax error in your bash script. The "&" means that you want to send a process to background not that you want to run many commands in parallel. To run commands in a serial fashion you should use cmd && cmd2, then the cmd2 will only be executed if the command 1 retu

[slurm-users] RES: RES: Change something in user's script using job_submit.lua plugin

2023-10-31 Thread Paulo Jose Braga Estrela
Yes, reading the sources I found that _update_job function in job_mgr.c is responsible for calling job_submit_plugin_modify function. After calling it, _update_job validates and apply the changes made by the plugin function to many job record fields but don’t touch the script field. So, for now

[slurm-users] RES: How to delay the start of slurmd until Infiniband/OPA network is fully up?

2023-10-31 Thread Paulo Jose Braga Estrela
I think that you should use NetworkManager-wait-online.service In RHEL 8. Take a look at its man page. It only allows the system reach network-online after all network interfaces are online. So, if your OP interfaces are managed by Network Manager, you can use it. PÚBLICA -Mensagem origina

Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

2023-10-31 Thread Jens Elkner
On Tue, Oct 31, 2023 at 10:59:56AM +0100, Ole Holm Nielsen wrote: Hi Ole, TLTR;: below systemd-networkd stuff, only. > On 10/30/23 20:15, Jeffrey R. Lang wrote: > > The service is available in RHEL 8 via the EPEL package repository as > > system-networkd, i.e. systemd-networkd.x86_64

[slurm-users] multiple srun commands in the same SLURM script

2023-10-31 Thread Andrei Berceanu
Here is my SLURM script: #!/bin/bash #SBATCH --job-name="gpu_test" #SBATCH --output=gpu_test_%j.log # Standard output and error log #SBATCH --account=berceanu_a+ #SBATCH --partition=gpu #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=31200m # Reserve 32 GB of RAM per core #SBATCH

Re: [slurm-users] How to delay the start of slurmd until Infiniband/OPA network is fully up?

2023-10-31 Thread Ole Holm Nielsen
Hi Jeffrey, On 10/30/23 20:15, Jeffrey R. Lang wrote: The service is available in RHEL 8 via the EPEL package repository as system-networkd, i.e. systemd-networkd.x86_64 253.4-1.el8epel Thanks for the info. We can install the systemd-networkd RP