Hi there all,
We have Dell server with 2 x Nvidia H100 and running slurm on it. After
restart server if we do not write nvidia-smi command slurm fails. When
we run nvidia-smi && systemctl restart slurmd && systemctl restart
slurmctld , slurm queue begins. Do you have any idea about this error
and what can we do for this issue?
--
Best regards,
Aziz Öğütlü
Eduline Bilişim Sanayi ve Ticaret Ltd. Şti. www.eduline.com.tr
Merkez Mah. Ayazma Cad. No:37 Papirus Plaza
Kat:6 Ofis No:118 Kağıthane - İstanbul - Türkiye 34406
Tel : +90 212 324 60 61 Cep: +90 541 350 40 72
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com