[FFmpeg-devel] [vf_tonemap_cuda] VERY alpha ground work- implemented only reinhard

Felix LeClair Thu, 14 Jan 2021 12:02:10 -0800

Hey everyone!

Trying to wrap my mind around how to deal with cuda HW frames and howto implement them.

The goal of this filter once completed will be to take in a cuda frame,tonemap the value to a given specification using a user requestedalgorithm (mobius, hable reinhard clip etc.)

This is useful because it completes (should) outperform cpu basedtonemapping by multiple 1-3 orders of magnitude depending on the gpuused for the filter.


I've based the attached filter off of the vf_scale_cuda.cu filter.

For ease of developement, I've kept everything the same including thename of the filter, only changing the funtion within the file. This isvery much a bodge to facilitate development. As such, for testing,this file should replace the vf_scale_cuda.cu file inffmpeg/libavfilter/vf_scale_cuda.cu

FFmpeg should then be compiled as standard for cuda filters and shouldbe called as you would call the standard vf_scale_cuda filter.

The command would be similar to:

ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -iinput.mp4 -vf scale_cuda=Source_width:Source_Height -c:a copy -c:vh264_nvenc -b:v 5M output.mp4

The above should decode in hardware, tonemap the frame on gpu andre-encode in hardware at a given bitrate.

will be in the freenode soon after sending this email (going to put onanother cup of coffee )


Thanks,

FelixCLC (felix__)

Caviat: Like all HW filters, how effective this is will depend on howmuch overhead is faced by doing a memcpy over the pcie bus to the gpuitself, then passing the data back once processed.

/*
 * original source Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
 *
 * Change to tonemap style filter copyright Felix LeClair 
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
 * DEALINGS IN THE SOFTWARE.
 */

/*

Warning: this is a VERY early alpha of a cuda accelerated filter to tonemap. Please see ffmpeg devel mailing list for message of title [vf_tonemap_cuda] VERY alpha ground work- implemented as cuda frame
sent on the 14th of January 2021 
It's poorly written and documented. this should not be merged under any circumstance in it's present form.


*/


#include "cuda/vector_helpers.cuh"

template<typename T>
__device__ inline void Subsample_Nearest(cudaTextureObject_t tex,
                                         T *dst,
                                         int dst_width, int dst_height, int dst_pitch,
                                         int src_width, int src_height,
                                         int bit_depth)
/*
tex is the cuda texture
T is a pointer to the destination frame
dst_width is the width of the output frame
dst_height is the height of the output frame 
dst_pitch is the I DON'T KNOW YET, but I suspect this has to do when changing the size of pixels when shifting aspect ratios.
	 as such I'm going to redifine as 1 so I don't have any issues 
bit_depth  is the amount of bits per colour channel
*/

{
	
	dst_pitch =1;// this is a bodge, but won't be needed when I change the rest of the source to not need to deal with the legacy scalling source code.
    int xo = blockIdx.x * blockDim.x + threadIdx.x;
    int yo = blockIdx.y * blockDim.y + threadIdx.y;

    if (yo < dst_height && xo < dst_width)
    {
        float hscale = (float)src_width / (float)dst_width;// supposed to be the scalling factor in the original funtion, but I'm going to ignore it
        float vscale = (float)src_height / (float)dst_height; // as above, going to ignore it
        float xi = (xo + 0.5f); // * hscale;
        float yi = (yo + 0.5f); // * vscale;
	float val_IN = tex2D<T>(tex, xi, yi);// to start I'm doing reinhard because it's idiot proof
	float out = val_IN*(val_IN/(val_IN + 1.0f)); // this scales the incoming pixel by a factor of x/(x+1). this guarentees a value between 0 and 1. far from the best algortihm, but is fit for purpose 
	dst[yo*dst_pitch+xo] =out; // this is where I'm transforming the value to the tonemapped value.  
    }
}


extern "C" {

#define NEAREST_KERNEL(T) \
    __global__ void Subsample_Nearest_ ## T(cudaTextureObject_t src_tex,                  \
                                            T *dst,                                       \
                                            int dst_width, int dst_height, int dst_pitch, \
                                            int src_width, int src_height,                \
                                            int bit_depth)                                \
    {                                                                                     \
	//call the device side  code under __device__ inline void Subsample_Nearest
        Subsample_Nearest<T>(src_tex, dst,                                                \
                              dst_width, dst_height, dst_pitch,                           \
                              src_width, src_height,                                      \
                              bit_depth);                                                 \
    }

NEAREST_KERNEL(uchar)
NEAREST_KERNEL(uchar2)
NEAREST_KERNEL(uchar4)

NEAREST_KERNEL(ushort)
NEAREST_KERNEL(ushort2)
NEAREST_KERNEL(ushort4)
}

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [vf_tonemap_cuda] VERY alpha ground work- implemented only reinhard

Reply via email to