[slurm-users] Re: TRES cpu vs tasks

2024-12-10 Thread Miriam Olmi via slurm-users
(=cpus) has to be rejected, doesn’t it? Best, Andreas Am 04.12.2024 um 11:18 schrieb Miriam Olmi via slurm-users :  Hi all, I cannot understand the true difference and definition of "core", "task" and "cpu" within the limits associated to a partition via the

[slurm-users] TRES cpu vs tasks

2024-12-04 Thread Miriam Olmi via slurm-users
Hi all, I cannot understand the true difference and definition of "core", "task" and "cpu" within the limits associated to a partition via the TRES variable of a QOS. More precisely I have 2 partitions defined as follows: PartitionName=lprod    AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL

[slurm-users] Re: controller backup slurmctld error while takeover

2024-03-26 Thread Miriam Olmi via slurm-users
are.  And validate you can ping the backup controller from the nodes by the name it has in the slurm.conf file. Also, a quick way to do the failover check is to run (from the backup controller): scontrol takeover Brian Andrus On 3/25/2024 1:39 PM, Miriam Olmi wrote: Hi Brian, Thanks for

[slurm-users] Re: controller backup slurmctld error while takeover

2024-03-25 Thread Miriam Olmi via slurm-users
/writing the very same files. Any other ideas? Thanks again, Miriam Il 25 marzo 2024 19:23:23 CET, Brian Andrus via slurm-users ha scritto: >Quick correction, it is SaveStateLocation not SlurmSaveState. > >Brian Andrus > >On 3/25/2024 8:11 AM, Miriam Olmi via slurm-users wro

[slurm-users] controller backup slurmctld error while takeover

2024-03-25 Thread Miriam Olmi via slurm-users
Dear all, I am having trouble finalizing the configuration of the backup controller for my slurm cluster. In principle, if no job is running everything seems fine: both the slurmctld services on the primary and the backup controller are running and if I stop the service on the primary contro

[slurm-users] Re: slurmdbd error - Symbol `slurm_conf' has different size in shared object

2024-02-29 Thread Miriam Olmi via slurm-users
ways, test it first at not-so-critical system, use vm > snapshots to be able to travel back in time ... as once you'll upgrade > DB schema (if part of upgrade) you AFAIK can not go back. > > josef > > On 28. 02. 24 15:51, Miriam Olmi via slurm-users wrote: >> I install

[slurm-users] Re: slurmdbd error - Symbol `slurm_conf' has different size in shared object

2024-02-28 Thread Miriam Olmi via slurm-users
ing else, the reasons why there is old lib in place may vary.. > > cheers > > josef > > > On 28. 02. 24 11:16, Miriam Olmi via slurm-users wrote: >> `slurm_conf' has different size in shared object, consider re-linking > -- > slurm-users mailing list -- s

[slurm-users] slurmdbd error - Symbol `slurm_conf' has different size in shared object

2024-02-28 Thread Miriam Olmi via slurm-users
Hi all, I am having some issue with the new version of slurm 23.11.0-1. I had already installed and configured slurm 23.02.3-1 on my cluster and all the services were active and running properly. Following the instructions of the official SLURM webpage, for the moment I upgrated only the slur

[slurm-users] slurmctld/slurmdbd (code=exited, status=217/USER)

2024-01-19 Thread Miriam Olmi
Hi all, I am having some issue with the new version of slurm 23.11.0-1. I had already installed and configured slurm 23.02.3-1 on my cluster and all the services were active and running properly. After I install with the same procedure the new version of slurm I have that the slurmctld and slurm