Public bug reported:

Description:    Ubuntu 18.04 LTS
Release:        18.04

Expected behavior: profile output

Actual behavior: error messages

Reproduce as follows:

cd NVIDIA_CUDA-9.1_Samples/0_Simple/matrixMul
nvcc -I ../../common/inc matrixMul.cu -o matrixMul

# check the exe works

./matrixMul 
[Matrix Multiply Using CUDA] - Starting...
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1137.23 GFlop/s, Time= 0.115 msec, Size= 131072000 Ops, 
WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements.
Results may vary when GPU Boost is enabled.

# now try nvprof
nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4775== NVPROF is profiling process 4775, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
==4775== Error: Internal profiling error 4168:999.
Performance= 1130.40 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, 
WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may 
vary when GPU Boost is enabled.
======== Error: CUDA profiling error.

# run with sudo
sudo nvprof ./matrixMul
[Matrix Multiply Using CUDA] - Starting...
==4797== NVPROF is profiling process 4797, command: ./matrixMul
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

MatrixA(320,320), MatrixB(640,320)
Computing result using CUDA Kernel...
done
Performance= 1132.95 GFlop/s, Time= 0.116 msec, Size= 131072000 Ops, 
WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may 
vary when GPU Boost is enabled.
==4797== Profiling application: ./matrixMul
==4797== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  
Name
 GPU activities:   99.54%  34.644ms       301  115.10us  114.15us  116.07us  
void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
                    0.28%  98.465us         2  49.232us  32.960us  65.505us  
[CUDA memcpy HtoD]
                    0.18%  62.944us         1  62.944us  62.944us  62.944us  
[CUDA memcpy DtoH]
      API calls:   74.77%  110.27ms         3  36.757ms  3.4300us  110.26ms  
cudaMalloc
                   22.45%  33.105ms         1  33.105ms  33.105ms  33.105ms  
cudaEventSynchronize
                    0.93%  1.3780ms         3  459.33us  427.70us  478.26us  
cudaGetDeviceProperties
                    0.81%  1.1874ms       301  3.9440us  3.7260us  18.511us  
cudaLaunch
                    0.36%  536.51us         3  178.84us  56.346us  363.23us  
cudaMemcpy
                    0.31%  451.50us        94  4.8030us     301ns  228.31us  
cuDeviceGetAttribute
                    0.11%  156.37us         1  156.37us  156.37us  156.37us  
cudaDeviceSynchronize
                    0.09%  132.82us      1505      88ns      79ns     289ns  
cudaSetupArgument
                    0.07%  100.43us         3  33.475us  4.3440us  83.746us  
cudaFree
                    0.06%  82.848us         1  82.848us  82.848us  82.848us  
cuDeviceTotalMem
                    0.02%  35.673us       301     118ns     110ns     801ns  
cudaConfigureCall
                    0.02%  33.788us         1  33.788us  33.788us  33.788us  
cuDeviceGetName
                    0.00%  5.3080us         2  2.6540us  2.2050us  3.1030us  
cudaEventRecord
                    0.00%  3.2350us         2  1.6170us  1.0960us  2.1390us  
cudaEventCreate
                    0.00%  2.8120us         1  2.8120us  2.8120us  2.8120us  
cudaSetDevice
                    0.00%  2.0920us         1  2.0920us  2.0920us  2.0920us  
cudaEventElapsedTime
                    0.00%  1.7410us         3     580ns     292ns  1.0710us  
cuDeviceGetCount
                    0.00%  1.0230us         2     511ns     353ns     670ns  
cuDeviceGet
                    0.00%     658ns         1     658ns     658ns     658ns  
cudaGetDeviceCount

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: nvidia-profiler 9.1.85-3
ProcVersionSignature: Ubuntu 4.15.0-20.21-generic 4.15.17
Uname: Linux 4.15.0-20-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7
Architecture: amd64
Date: Thu Apr 26 17:28:48 2018
Dependencies:
 gcc-8-base 8-20180414-1ubuntu2
 libc6 2.27-3ubuntu1
 libcuinj64-9.1 9.1.85-3
 libgcc1 1:8-20180414-1ubuntu2
InstallationDate: Installed on 2018-04-21 (5 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Alpha amd64 (20180421)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nvidia-cuda-toolkit
UpgradeStatus: No upgrade log present (probably fresh install)

** Affects: nvidia-cuda-toolkit (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: amd64 apport-bug bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1767205

Title:
  nvprof does not complete without sudo

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nvidia-cuda-toolkit/+bug/1767205/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to