2015-08-27 15:08 GMT+08:00 Steven Liu <lingjiujia...@gmail.com>: > > > 2015-08-27 14:52 GMT+08:00 Steven Liu <lingjiujia...@gmail.com>: > >> >> 2015-06-29 18:12 GMT+08:00 Klaus Schürmann <k...@mediabeam.com>: >> >>> Hello, >>> >>> I compiled ffmpeg with nvenc support. The compile process worked without >>> any error. But if I try to convert a file with nvenc I got the error >>> message "[nvenc @ 0x39dc1c0] CreateInputBuffer failed". >>> >>> Can somebody help me to fix this problem? >>> >>> Best Regards >>> Klaus Schuermann >>> >>> OS: Ubuntu 14.04.2 LTS >>> NVidia driver: 346 >>> >>> Her is the complete output oft he convert job: >>> >>> root@video-convert1:~/ffmpeg_sources/ffmpeg_libnvenc# ffmpeg -i >>> /media/testfile.mkv -r 60 -s 1024x768 -vcodec nvenc -b:v 5750k testfile.mp4 >>> ffmpeg version N-73133-gd7e224e Copyright (c) 2000-2015 the FFmpeg >>> developers >>> built with gcc 4.8 (Ubuntu 4.8.4-2ubuntu1~14.04) >>> configuration: --prefix=/root/ffmpeg_build --pkg-config-flags=--static >>> --extra-cflags=-I/root/ffmpeg_build/include >>> --extra-ldflags=-L/root/ffmpeg_build/lib --bindir=/root/bin --enable-gpl >>> --enable-libass --enable-libfdk-aac --enable-libfreetype >>> --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis >>> --enable-libvpx --enable-libx264 --enable-libx265 --enable-nvenc >>> --enable-nonfree >>> libavutil 54. 27.100 / 54. 27.100 >>> libavcodec 56. 44.101 / 56. 44.101 >>> libavformat 56. 38.101 / 56. 38.101 >>> libavdevice 56. 4.100 / 56. 4.100 >>> libavfilter 5. 18.100 / 5. 18.100 >>> libswscale 3. 1.101 / 3. 1.101 >>> libswresample 1. 2.100 / 1. 2.100 >>> libpostproc 53. 3.100 / 53. 3.100 >>> Input #0, matroska,webm, from '/media/testfile.mkv': >>> Metadata: >>> encoder : libebml v1.3.0 + libmatroska v1.4.1 >>> creation_time : 2014-09-29 00:31:12 >>> Duration: 00:21:03.51, start: 0.000000, bitrate: 3015 kb/s >>> Stream #0:0(eng): Video: h264 (High), yuv420p(tv, >>> bt709/unknown/unknown), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, >>> 1k tbn, 47.95 tbc (default) >>> Stream #0:1: Audio: ac3, 48000 Hz, 5.1(side), fltp, 448 kb/s >>> (default) [nvenc @ 0x39dc1c0] CreateInputBuffer failed Output #0, mp4, to >>> 'testfile.mp4': >>> Metadata: >>> encoder : libebml v1.3.0 + libmatroska v1.4.1 >>> Stream #0:0(eng): Video: h264, none, q=2-31, 128 kb/s, SAR 4:3 DAR >>> 0:0, 60 fps (default) >>> Metadata: >>> encoder : Lavc56.44.101 nvenc >>> Stream #0:1: Audio: aac, 0 channels, 128 kb/s (default) >>> Metadata: >>> encoder : Lavc56.44.101 libfdk_aac >>> Stream mapping: >>> Stream #0:0 -> #0:0 (h264 (native) -> h264 (nvenc)) >>> Stream #0:1 -> #0:1 (ac3 (native) -> aac (libfdk_aac)) Error while >>> opening encoder for output stream #0:0 - maybe incorrect parameters such as >>> bit_rate, rate, width or height >>> >>> Output of devicequery: >>> >>> root@video-convert1:~# >>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery >>> NVIDIA_CUDA-7.0_Samples/1_Utilities/deviceQuery/deviceQuery Starting... >>> >>> CUDA Device Query (Runtime API) version (CUDART static linking) >>> >>> Detected 4 CUDA Capable device(s) >>> >>> Device 0: "GRID K1" >>> CUDA Driver Version / Runtime Version 7.0 / 7.0 >>> CUDA Capability Major/Minor version number: 3.0 >>> Total amount of global memory: 4096 MBytes (4294770688 >>> bytes) >>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores >>> GPU Max Clock rate: 850 MHz (0.85 GHz) >>> Memory Clock rate: 891 Mhz >>> Memory Bus Width: 128-bit >>> L2 Cache Size: 262144 bytes >>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, >>> 65536), 3D=(4096, 4096, 4096) >>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers >>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 >>> layers >>> Total amount of constant memory: 65536 bytes >>> Total amount of shared memory per block: 49152 bytes >>> Total number of registers available per block: 65536 >>> Warp size: 32 >>> Maximum number of threads per multiprocessor: 2048 >>> Maximum number of threads per block: 1024 >>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>> Max dimension size of a grid size (x,y,z): (2147483647, 65535, >>> 65535) >>> Maximum memory pitch: 2147483647 bytes >>> Texture alignment: 512 bytes >>> Concurrent copy and kernel execution: Yes with 1 copy >>> engine(s) >>> Run time limit on kernels: No >>> Integrated GPU sharing Host Memory: No >>> Support host page-locked memory mapping: Yes >>> Alignment requirement for Surfaces: Yes >>> Device has ECC support: Disabled >>> Device supports Unified Addressing (UVA): Yes >>> Device PCI Domain ID / Bus ID / location ID: 0 / 132 / 0 >>> Compute Mode: >>> < Default (multiple host threads can use ::cudaSetDevice() with >>> device simultaneously) > >>> >>> Device 1: "GRID K1" >>> CUDA Driver Version / Runtime Version 7.0 / 7.0 >>> CUDA Capability Major/Minor version number: 3.0 >>> Total amount of global memory: 4096 MBytes (4294770688 >>> bytes) >>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores >>> GPU Max Clock rate: 850 MHz (0.85 GHz) >>> Memory Clock rate: 891 Mhz >>> Memory Bus Width: 128-bit >>> L2 Cache Size: 262144 bytes >>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, >>> 65536), 3D=(4096, 4096, 4096) >>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers >>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 >>> layers >>> Total amount of constant memory: 65536 bytes >>> Total amount of shared memory per block: 49152 bytes >>> Total number of registers available per block: 65536 >>> Warp size: 32 >>> Maximum number of threads per multiprocessor: 2048 >>> Maximum number of threads per block: 1024 >>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>> Max dimension size of a grid size (x,y,z): (2147483647, 65535, >>> 65535) >>> Maximum memory pitch: 2147483647 bytes >>> Texture alignment: 512 bytes >>> Concurrent copy and kernel execution: Yes with 1 copy >>> engine(s) >>> Run time limit on kernels: No >>> Integrated GPU sharing Host Memory: No >>> Support host page-locked memory mapping: Yes >>> Alignment requirement for Surfaces: Yes >>> Device has ECC support: Disabled >>> Device supports Unified Addressing (UVA): Yes >>> Device PCI Domain ID / Bus ID / location ID: 0 / 133 / 0 >>> Compute Mode: >>> < Default (multiple host threads can use ::cudaSetDevice() with >>> device simultaneously) > >>> >>> Device 2: "GRID K1" >>> CUDA Driver Version / Runtime Version 7.0 / 7.0 >>> CUDA Capability Major/Minor version number: 3.0 >>> Total amount of global memory: 4096 MBytes (4294770688 >>> bytes) >>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores >>> GPU Max Clock rate: 850 MHz (0.85 GHz) >>> Memory Clock rate: 891 Mhz >>> Memory Bus Width: 128-bit >>> L2 Cache Size: 262144 bytes >>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, >>> 65536), 3D=(4096, 4096, 4096) >>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers >>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 >>> layers >>> Total amount of constant memory: 65536 bytes >>> Total amount of shared memory per block: 49152 bytes >>> Total number of registers available per block: 65536 >>> Warp size: 32 >>> Maximum number of threads per multiprocessor: 2048 >>> Maximum number of threads per block: 1024 >>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>> Max dimension size of a grid size (x,y,z): (2147483647, 65535, >>> 65535) >>> Maximum memory pitch: 2147483647 bytes >>> Texture alignment: 512 bytes >>> Concurrent copy and kernel execution: Yes with 1 copy >>> engine(s) >>> Run time limit on kernels: No >>> Integrated GPU sharing Host Memory: No >>> Support host page-locked memory mapping: Yes >>> Alignment requirement for Surfaces: Yes >>> Device has ECC support: Disabled >>> Device supports Unified Addressing (UVA): Yes >>> Device PCI Domain ID / Bus ID / location ID: 0 / 134 / 0 >>> Compute Mode: >>> < Default (multiple host threads can use ::cudaSetDevice() with >>> device simultaneously) > >>> >>> Device 3: "GRID K1" >>> CUDA Driver Version / Runtime Version 7.0 / 7.0 >>> CUDA Capability Major/Minor version number: 3.0 >>> Total amount of global memory: 4096 MBytes (4294770688 >>> bytes) >>> ( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores >>> GPU Max Clock rate: 850 MHz (0.85 GHz) >>> Memory Clock rate: 891 Mhz >>> Memory Bus Width: 128-bit >>> L2 Cache Size: 262144 bytes >>> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, >>> 65536), 3D=(4096, 4096, 4096) >>> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers >>> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 >>> layers >>> Total amount of constant memory: 65536 bytes >>> Total amount of shared memory per block: 49152 bytes >>> Total number of registers available per block: 65536 >>> Warp size: 32 >>> Maximum number of threads per multiprocessor: 2048 >>> Maximum number of threads per block: 1024 >>> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >>> Max dimension size of a grid size (x,y,z): (2147483647, 65535, >>> 65535) >>> Maximum memory pitch: 2147483647 bytes >>> Texture alignment: 512 bytes >>> Concurrent copy and kernel execution: Yes with 1 copy >>> engine(s) >>> Run time limit on kernels: No >>> Integrated GPU sharing Host Memory: No >>> Support host page-locked memory mapping: Yes >>> Alignment requirement for Surfaces: Yes >>> Device has ECC support: Disabled >>> Device supports Unified Addressing (UVA): Yes >>> Device PCI Domain ID / Bus ID / location ID: 0 / 135 / 0 >>> Compute Mode: >>> < Default (multiple host threads can use ::cudaSetDevice() with >>> device simultaneously) > >>> > Peer access from GRID K1 (GPU0) -> GRID K1 (GPU1) : Yes Peer access >>> > from GRID K1 (GPU0) -> GRID K1 (GPU2) : Yes Peer access from GRID K1 >>> > (GPU0) -> GRID K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID >>> > K1 (GPU1) : No Peer access from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes >>> > Peer access from GRID K1 (GPU1) -> GRID K1 (GPU3) : Yes Peer access >>> > from GRID K1 (GPU2) -> GRID K1 (GPU1) : Yes Peer access from GRID K1 >>> > (GPU2) -> GRID K1 (GPU2) : No Peer access from GRID K1 (GPU2) -> GRID >>> > K1 (GPU3) : Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU0) : >>> > Yes Peer access from GRID K1 (GPU1) -> GRID K1 (GPU1) : No Peer access >>> > from GRID K1 (GPU1) -> GRID K1 (GPU2) : Yes Peer access from GRID K1 >>> > (GPU2) -> GRID K1 (GPU0) : Yes Peer access from GRID K1 (GPU2) -> GRID >>> > K1 (GPU1) : Yes Peer access from GRID K1 (GPU2) -> GRID K1 (GPU2) : No >>> > Peer access from GRID K1 (GPU3) -> GRID K1 (GPU0) : Yes Peer access >>> > from GRID K1 (GPU3) -> GRID K1 (GPU1) : Yes Peer access from GRID K1 >>> > (GPU3) -> GRID K1 (GPU2) : Yes >>> >>> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA >>> Runtime Version = 7.0, NumDevs = 4, Device0 = GRID K1, Device1 = GRID K1, >>> Device2 = GRID K1, Device3 = GRID K1 Result = PASS >>> >> >> I have the same error message, What should we attention? my GPU pcie >> message bellow: >> >> >> [root@localhost release]# ./deviceQuery >> ./deviceQuery Starting... >> >> CUDA Device Query (Runtime API) version (CUDART static linking) >> >> Detected 1 CUDA Capable device(s) >> >> Device 0: "Tesla K20c" >> CUDA Driver Version / Runtime Version 7.0 / 7.0 >> CUDA Capability Major/Minor version number: 3.5 >> Total amount of global memory: 4800 MBytes (5032706048 >> bytes) >> (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores >> GPU Max Clock rate: 706 MHz (0.71 GHz) >> Memory Clock rate: 2600 Mhz >> Memory Bus Width: 320-bit >> L2 Cache Size: 1310720 bytes >> Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, >> 65536), 3D=(4096, 4096, 4096) >> Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers >> Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 >> layers >> Total amount of constant memory: 65536 bytes >> Total amount of shared memory per block: 49152 bytes >> Total number of registers available per block: 65536 >> Warp size: 32 >> Maximum number of threads per multiprocessor: 2048 >> Maximum number of threads per block: 1024 >> Max dimension size of a thread block (x,y,z): (1024, 1024, 64) >> Max dimension size of a grid size (x,y,z): (2147483647, 65535, >> 65535) >> Maximum memory pitch: 2147483647 bytes >> Texture alignment: 512 bytes >> Concurrent copy and kernel execution: Yes with 2 copy engine(s) >> Run time limit on kernels: No >> Integrated GPU sharing Host Memory: No >> Support host page-locked memory mapping: Yes >> Alignment requirement for Surfaces: Yes >> Device has ECC support: Enabled >> Device supports Unified Addressing (UVA): Yes >> Device PCI Domain ID / Bus ID / location ID: 0 / 4 / 0 >> Compute Mode: >> < Default (multiple host threads can use ::cudaSetDevice() with >> device simultaneously) > >> >> deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.0, CUDA >> Runtime Version = 7.0, NumDevs = 1, Device0 = Tesla K20c >> Result = PASS >> >> > Hi Timo, > > I saw the status of 0x0A if header file /usr/include/nvEncodeAPI.h > perhaps the memory alloc is large? > > /* 1MB is large enough to hold most output frames. NVENC increases this > automaticaly if it's not enough. */ > allocOut.size = 1024 * 1024; > > allocOut.memoryHeap = NV_ENC_MEMORY_HEAP_SYSMEM_CACHED; > > > /** > * This indicates that the API call failed because it was unable to > allocate > * enough memory to perform the requested operation. > */ > NV_ENC_ERR_OUT_OF_MEMORY, > > I make mistake, this is not the error code info,
i got the gdb message is : Missing separate debuginfo for /lib64/libcuda.so [nvenc @ 0x1a8f700] 1 CUDA capable devices found [nvenc @ 0x1a8f700] [ GPU #0 - < Tesla K20c > has Compute SM 3.5, smver 53 target_smver 48 NVENC Available ] [nvenc @ 0x1a8f700] Nvenc initialized successfully [New Thread 0x7ffff1906700 (LWP 8614)] [nvenc @ 0x1a8f700] in for surfaceCount = 0 ctx->max_surface_count = 48 Breakpoint 2, nvenc_encode_init (avctx=0x1a8f700) at /home/liuqi/ffmpeg/libavcodec/nvenc.c:981 981 nv_status = p_nvenc->nvEncCreateInputBuffer(ctx->nvencoder, &allocSurf); (gdb) p allocSurf $1 = {version = 1342243592, width = 512, height = 288, memoryHeap = NV_ENC_MEMORY_HEAP_SYSMEM_CACHED, bufferFmt = NV_ENC_BUFFER_FORMAT_YV12_PL, reserved = 0, inputBuffer = 0x0, pSysMemBuffer = 0x0, reserved1 = {0 <repeats 57 times>}, reserved2 = {0x0 <repeats 63 times>}} _______________________________________________ ffmpeg-user mailing list ffmpeg-user@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-user