[slurm-users] Re: slurmctld hourly: Unexpected missing socket error

2024-07-29 Thread Jason Ellul via slurm-users
Thanks again Patryk, For your insights, we have implemented many of the same things, but the socket errors are still occurring regularly. If we find a solution that works I will be sure to add it to this thread. Many thanks Jason Jason Ellul Head - Research Computing Facility Office of Cance

[slurm-users] Final Call for SLUG Standard Registration

2024-07-29 Thread Victoria Hobson via slurm-users
Slurm User Group (SLUG) 2024 is set for September 12-13 at the University of Oslo in Oslo, Norway. Registration information, abstracts, and travel recommendations can be found here:https://slug24.splashthat.com/ The last day to register with standard pricing ($900) is this Friday, August 2nd. Aft

[slurm-users] Re: Convergence of Kube and Slurm?

2024-07-29 Thread wdennis--- via slurm-users
Can I ask if this replaces the work on "SUNK" that was previously announced? (but never released as open-source on GitHub as was planned; looks like it is only available on CoreWeave Cloud?) -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users

[slurm-users] Re: Slurm fails before nvidia-smi command

2024-07-29 Thread Cutts, Tim via slurm-users
It sounds to me perhaps as though your systemd units are starting in the wrong order, or don’t have appropriate dependencies set in them? Tim -- Tim Cutts Scientific Computing Platform Lead AstraZeneca Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our S

[slurm-users] Re: Slurm fails before nvidia-smi command

2024-07-29 Thread Aziz Ogutlu via slurm-users
After added nvidia-persistenced service, slurm did not fail. Thanks for your help. On 7/29/24 13:00, Sarlo, Jeffrey S wrote: nvidia-persistenced is something that gets installed by the nvidia driver.  Setting it to start at boot time helps with slurmd being able to find the GPUs when it tries

[slurm-users] Re: Slurm fails before nvidia-smi command

2024-07-29 Thread Sarlo, Jeffrey S via slurm-users
nvidia-persistenced is something that gets installed by the nvidia driver. Setting it to start at boot time helps with slurmd being able to find the GPUs when it tries to start. This is just one web page that has some information about it. https://download.nvidia.com/XFree86/Linux-x86_64/396.

[slurm-users] Re: Slurm fails before nvidia-smi command

2024-07-29 Thread Steffen Grunewald via slurm-users
On Mon, 2024-07-29 at 11:23:12 +0300, Slurm users wrote: > Hi there all, > > We have Dell server with 2 x Nvidia H100 and running slurm on it. After > restart server if we do not write nvidia-smi command slurm fails. When we > run nvidia-smi && systemctl restart slurmd && systemctl restart slurmct

[slurm-users] Slurm fails before nvidia-smi command

2024-07-29 Thread Aziz Ogutlu via slurm-users
Hi there all, We have Dell server with 2 x Nvidia H100 and running slurm on it. After restart server if we do not write nvidia-smi command slurm fails. When we run nvidia-smi && systemctl restart slurmd && systemctl restart slurmctld , slurm queue begins. Do you have any idea about this error

[slurm-users] Re: Slurm sacct ResvCPURAW invalid field in version 24.12.5

2024-07-29 Thread Bjørn-Helge Mevik via slurm-users
Perhaps PlannedCPURAW? -- Regards, Bjørn-Helge Mevik, dr. scient, Department for Research Computing, University of Oslo signature.asc Description: PGP signature -- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com