[slurm-users] Re: 回复: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-21 Thread Daniel Letai via slurm-users
Looking at the code, it would seem the DbdBackupHost (in slurmdbd.conf) is used to determine whether or not to run in standby mode. https://github.com/SchedMD/slurm/blob/ea17bbffc381deae54e126b227d5290bf9525326/src/slurmdbd/slurmdbd.c#L296-L314 https://github.com

[slurm-users] Re: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-20 Thread Daniel Letai via slurm-users
There are 2 backuphosts configurations. DbdBackupHost is used if the slurmdbd service is unavailable (timeout). In that case the slurmctld will try to connect to the slurmdbd on another node. StorageBackupHost, on the other hand, is what you des

[slurm-users] Re: 回复: Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-20 Thread Daniel Letai via slurm-users
Agreed And slurmdbd also caches if the DB is down, if I remember correctly. On 21/02/2025 7:09, Brian Andrus via slurm-users wrote: Daniel, One way to set up a true HA is to configure master-master SQL ins

[slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-20 Thread Daniel Letai via slurm-users
It's functionally the same with one difference - the configuration file is unmodified between nodes, allowing for simple deployment of nodes, and automation. Regarding the backuphost - that depends on your setup. If you can ensure the slurmdbd service wil

[slurm-users] Re: how to set slurmdbd.conf if using two slurmdb node with HA database?

2025-02-19 Thread Daniel Letai via slurm-users
I'm not sure it will work, didn't test it, but could you just do `dbdhost=localhost` to solve this? On 18/02/2025 11:59, hermes via slurm-users wrote: The deployment scenario is as follows:

[slurm-users] Re: Run only one time on a node

2025-02-19 Thread Daniel Letai via slurm-users
There are a couple of options here, not exactly convenient but will get the job done: 1. Use array, with `-N 1 -w ` defined for each array task. You can do the same without array, using for loop to submit different sbatchs. 2. Use `scontrol reboot`. Set the reb

[slurm-users] Re: REST API - get_user_environment

2024-08-29 Thread Daniel Letai via slurm-users
Actually this is not Slurm versioning strictly speaking, this is openapi versioning - the move from 0.0.38 to 0.0.39 also dropped this particular endpoint. You will notice that the same major Slurm version supports different API versions. On 28/08/2024 03:02:00, Chris Samuel via slurm-users

[slurm-users] Re: REST API - get_user_environment

2024-08-23 Thread Daniel Letai via slurm-users
https://github.com/SchedMD/slurm/blob/ffae59d9df69aa42a090044b867be660be259620/src/plugins/openapi/v0.0.38/jobs.c#L136 but no longer in https://github.com/SchedMD/slurm/blob/slurm-23.02/src/plugins/openapi/v0.0.39/jobs.c Which underwent major revision In the next openapi version On 22/0

[slurm-users] Re: With slurm, how to allocate a whole node for a single multi-threaded process?

2024-08-05 Thread Daniel Letai via slurm-users
I think the issue is more severe than you describe. Slurm juggles the needs of many jobs. Just because there are some resources available at the exact second a job starts, doesn't mean those resource are not pre-allocated for some future job waiting for e

[slurm-users] Re: slumrestd 24.05.1: crashes when GET on /slurm/v0.0.41/nodes : unsorted double linked list corrupted

2024-07-24 Thread Daniel Letai via slurm-users
This is a know issue and resolved in 24.05.2 in the patches labeled "Always allocate pointers despite skipping parsing" For example: https://github.com/SchedMD/slurm/commit/5b07b6bda407431215606b93e57d0a9b7f4c9b53 The same patch also applies to 0.0.40 and 0.0

[slurm-users] Re: Custom Plugin Integration

2024-07-19 Thread Daniel Letai via slurm-users
input) to Slurm as a simple string of sbatch flags, and just let Slurm do it's thing. It sounds simpler than forcing all other users of the cluster to adhere to your particular needs without introducing unnecessary complexity to the cluster. Regards, Bhaskar. Regards, --Dani_L. O

[slurm-users] Re: Custom Plugin Integration

2024-07-17 Thread Daniel Letai via slurm-users
In the scenario you provide, you don't need anything special. You just have to configure a partition that is available only to you, and to no other account on the cluster. This partition will only include your hosts. All other partition will not include any

[slurm-users] Re: Custom Plugin Integration

2024-07-12 Thread Daniel Letai via slurm-users
I'm not sure I understand why your app must decide the placement, rather then tell Slurm about the requirements (This sounds suspiciously like Not Invented Here syndrome), but Slurm does have the '-w' flag to salloc,sbatch and srun. I just don't understand if you don't have an entire cluster

[slurm-users] Replacing MUNGE with SACK (auth/slurm)

2024-07-11 Thread Daniel Letai via slurm-users
Does SACK replace MUNGE? As in - MUNGE is not required when building Slurm or on compute? If so, can the Requires and BuildRequires for munge be made optional on bcond_without_munge in the spec file? Or is there a reason MUNGE must remain a hard require for Slurm? Thanks, --Dani_L. -- sl

[slurm-users] Re: Convergence of Kube and Slurm?

2024-05-06 Thread Daniel Letai via slurm-users
There is a kubeflow offering that might be of interest: https://www.dkube.io/post/mlops-on-hpc-slurm-with-kubeflow I have not tried it myself, no idea how well it works. Regards, --Dani_L. On 05/05/2024 0:05, Dan Healy via slurm-us