Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(

tuxic Wed, 15 Aug 2018 08:12:23 -0700

On 08/15 02:32, Corentin “Nado” Pazdera wrote:
> August 15, 2018 2:59 PM, tu...@posteo.de wrote:
> 
> > Yes I did reboot the sustem. In my initial mail I mentioned a tool
> > called CUDA-Z and Blender, which both reports a missing CUDA device.
> 
> Ok, so you do not have a specific error which might have been thrown by the 
> module?
> Other ideas, check dev-util/nvidia-cuda-toolkit version and double check 
> nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded 
> correctly with the right version?
> Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from 
> nvidia-cuda-toolkit) and show its output?
> 
> My installation works, so at least we know their version is not completely 
> broken...
> Driver version: 396.51
> Cuda version: 9.2.88
> 
> --
> Corentin “Nado” Pazdera
>


I compiled the new version of the driver again and rebooted the
system.

# dmesg | grep -i nvidia:

[   11.375631] nvidia_drm: module license 'MIT' taints kernel.
[   12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device 
number 246
[   12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=none:owns=io+mem
[   12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)
[   12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: 
olddecodes=io+mem,decodes=none:owns=none
[   12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.51  Tue Jul 
31 10:43:06 PDT 2018 (using threaded interrupts)
[   12.491106] input: HDA NVidia HDMI as 
/devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9
[   12.492291] input: HDA NVidia HDMI as 
/devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10
[   12.493772] input: HDA NVidia HDMI as 
/devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11
[   12.494605] input: HDA NVidia HDMI as 
/devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12
[   13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
[   34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
[   34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for 
UNIX platforms  396.51  Tue Jul 31 14:52:09 PDT 2018

# modprobe -a nvidia-uvm

# dmesg | grep uvm

[  209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 
245


# /opt/cuda/extras/demo_suite/deviceQuery
/opt/cuda/extras/demo_suite/deviceQuery Starting...      

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
[1]    5086 exit 1     /opt/cuda/extras/demo_suite/deviceQuery

CUDA-Z shows also "no CUDA device" 

# modinfo nvidia-uvm
filename:       /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
supported:      external
license:        MIT
depends:        nvidia
name:           nvidia_uvm
vermagic:       4.18.0-RT SMP preempt mod_unload 
parm:           uvm_perf_prefetch_enable:uint
parm:           uvm_perf_prefetch_threshold:uint
parm:           uvm_perf_prefetch_min_faults:uint
parm:           uvm_perf_thrashing_enable:uint
parm:           uvm_perf_thrashing_threshold:uint
parm:           uvm_perf_thrashing_pin_threshold:uint
parm:           uvm_perf_thrashing_lapse_usec:uint
parm:           uvm_perf_thrashing_nap_usec:uint
parm:           uvm_perf_thrashing_epoch_msec:uint
parm:           uvm_perf_thrashing_max_resets:uint
parm:           uvm_perf_thrashing_pin_msec:uint
parm:           uvm_perf_map_remote_on_native_atomics_fault:uint
parm:           uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored 
if CONFIG_HMM is not set, or if NEXT settings conflict with HMM. (int)
parm:           uvm_global_oversubscription:Enable (1) or disable (0) global 
oversubscription support. (int)
parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 
1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. 
(int)
parm:           uvm_force_prefetch_fault_support:uint
parm:           uvm_debug_enable_push_desc:Enable push description tracking 
(int)
parm:           uvm_page_table_location:Set the location for UVM-allocated page 
tables. Choices are: vid, sys. (charp)
parm:           uvm_perf_access_counter_mimc_migration_enable:Whether MIMC 
access counters will trigger migrations.Valid values: <= -1 (default policy), 0 
(off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_momc_migration_enable:Whether MOMC 
access counters will trigger migrations.Valid values: <= -1 (default policy), 0 
(off), >= 1 (on) (int)
parm:           uvm_perf_access_counter_batch_count:uint
parm:           uvm_perf_access_counter_granularity:Size of the physical memory 
region tracked by each counter. Valid values asof Volta: 64k, 2m, 16m, 16g 
(charp)
parm:           uvm_perf_access_counter_threshold:Number of remote accesses on 
a region required to trigger a notification.Valid values: [1, 65535] (uint)
parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint
parm:           uvm_perf_fault_batch_count:uint
parm:           uvm_perf_fault_replay_policy:uint
parm:           uvm_perf_fault_replay_update_put_ratio:uint
parm:           uvm_perf_fault_max_batches_per_service:uint
parm:           uvm_perf_fault_max_throttle_per_service:uint
parm:           uvm_perf_fault_coalesce:uint
parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages 
that faulted. Default: 0. (int)
parm:           uvm_perf_map_remote_on_eviction:int
parm:           uvm_channel_num_gpfifo_entries:uint
parm:           uvm_channel_gpfifo_loc:charp
parm:           uvm_channel_gpput_loc:charp
parm:           uvm_channel_pushbuffer_loc:charp
parm:           uvm_enable_debug_procfs:Enable debug procfs entries in 
/proc/driver/nvidia-uvm (int)
parm:           uvm8_ats_mode:Override the default ATS (Address Translation 
Services) UVM mode by disabling (0) or enabling (1) (int)
parm:           uvm_driver_mode:Set the uvm kernel driver mode. Choices 
include: 8 (charp)
parm:           uvm_debug_prints:Enable uvm debug prints. (int)
parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This 
is a security risk) (int)


# ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
-rw-r--r-- 1 root root 1405808 Aug 15 16:49 
/lib/modules/4.18.0-RT/video/nvidia-uvm.ko
(just installed minytes before)

# uname -a
Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD 
Phenom(tm) II X6 1090T Processor AuthenticAMD GNU/Linux
(the kernel version matches)

# eix nvidia-cuda-toolkit

[I] dev-util/nvidia-cuda-toolkit
     Available versions:  [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) 
[M](~)7.5.18-r2(0/7.5.18) [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) 
(~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85) (~)9.2.88(0/9.2.88) {debugger doc 
eclipse profiler}
     Installed versions:  9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger 
-doc -eclipse -profiler)
     Homepage:            https://developer.nvidia.com/cuda-zone
     Description:         NVIDIA CUDA Toolkit (compiler and friends)



It becomes even more weird...

Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(

Reply via email to