Re: Installing CUDA with FAI

2024-10-24 Diskussionsfäden Andrew Ruthven
On Thu, 2024-10-24 at 14:50 +0200, Stephan Frank wrote:
> Amongst other approaches I have tried the runfile installation like so:
> 
> > chroot /target apt install -y make linux-headers-$(uname -r)
> > chroot /target wget -nc
> > https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installe
> > rs/cuda_12.6.2_560.35.03_linux.run
> > chroot /target sh cuda_12.6.2_560.35.03_linux.run --driver --toolkit

I expect that this will also be trying to use the kernel version that you're
running from the nfsroot, which is unlikely to be the kernel you'll actually
be running.

Cheers,
Andrew

-- 
Andrew Ruthven, Wellington, New Zealand
and...@etc.gen.nz |
Catalyst Cloud:   | This space intentionally left blank
 https://catalystcloud.nz |



Installing CUDA with FAI

2024-10-24 Diskussionsfäden Stephan Frank


Hello everyone,

has anybody ever successfully installed CUDA via FAI into a Debian Bookwork (or 
any other) installation? I have been trying to set this up for over a week now 
- yet no success.

Regards, Stephan



Re: Installing CUDA with FAI

2024-10-24 Diskussionsfäden Thomas Lange
> On Thu, 24 Oct 2024 14:50:20 +0200 (CEST), Stephan Frank 
>  said:

> Amongst other approaches I have tried the runfile installation like so:

>> chroot /target apt install -y make linux-headers-$(uname -r)
>> chroot /target wget -nc 
https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run
>> chroot /target sh cuda_12.6.2_560.35.03_linux.run --driver --toolkit
I never used the run files. I always use the .deb packages.


> This usually hangs because it wants to uninstall nouveau drivers and asks 
for permission via a graphical interface.
Why not removing the nouveau package via FAI before calling a
customization script?

> Bonus question: Is there a good way to autmatically figure out whether 
the machine can even use CUDA/nvidia drivers? So I don't have to sort machines 
by hardware in the class file.
There's the package nvidia-detect.

Here's some code I use:

NV_DEVICES=$(lspci -mn | awk '{ gsub("\"",""); if (($2 == "0300" || $2 == 
"0302") && ($3 == "10de" || $3 == "12d2")) { print $1 } }')
if [ -n "$NV_DEVICES" ]; then
   echo NVIDIA
fi

or

nvidia-smi -L >/dev/null 2>/dev/null
if [ $? -eq 0 ]; then
  echo nvidia GPU detected
fi

--
regards Thomas


Re: Installing CUDA with FAI

2024-10-24 Diskussionsfäden Diego Zuccato
Nope, I'm using Salt for everything that is 
"personalization-after-installation", including CUDA install.

But FAI acts as packages cache also for CUDA :)

Diego

Il 24/10/2024 13:48, Stephan Frank ha scritto:


Hello everyone,

has anybody ever successfully installed CUDA via FAI into a Debian Bookwork (or 
any other) installation? I have been trying to set this up for over a week now 
- yet no success.

Regards, Stephan



--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



Re: Installing CUDA with FAI

2024-10-24 Diskussionsfäden Stephan Frank
Hallo Thomas,

thank you for your kind reply. 

> I wonder what the problems are. Do you have some excerpt from the logs?

I use this installation guide and try to make it into a script:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/

Amongst other approaches I have tried the runfile installation like so:

> chroot /target apt install -y make linux-headers-$(uname -r)
> chroot /target wget -nc 
> https://developer.download.nvidia.com/compute/cuda/12.6.2/local_installers/cuda_12.6.2_560.35.03_linux.run
> chroot /target sh cuda_12.6.2_560.35.03_linux.run --driver --toolkit

This usually hangs because it wants to uninstall nouveau drivers and asks for 
permission via a graphical interface.

Is this even the right way to approach this task? How do you do it, it sounded 
like you approach was a little bit different perhaps?

Bonus question: Is there a good way to autmatically figure out whether the 
machine can even use CUDA/nvidia drivers? So I don't have to sort machines by 
hardware in the class file.

Many thanks and kind regards,

Stephan



-Original Message-
From: Thomas 
To: fully 
Date: Thursday, 24 October 2024 2:06 PM CEST
Subject: Re: Installing CUDA with FAI


Hi,

I have created several versions of the nfsroot including new nvidia
drivers and CUDA libraries, because we needed the newest drivers for
new hardware.

Often I used a mixture of packages from testing and experimental. I've
also created a nfsroot using the drivers and CUDA libs from nvidia
itself.



-- 
regards Thomas