Re: [FFmpeg-devel] [PATCH V4] Add a filter implementing HDR image generation from a single exposure using deep CNNs

Liu Steven Sun, 04 Nov 2018 23:57:29 -0800


> 在 2018年11月5日，下午3:42，Guo, Yejun <yejun....@intel.com> 写道：
> 
> ask for comment or merge, thanks.
Will push after 24 hours if there have no objections.
> 
>> -----Original Message-----
>> From: ffmpeg-devel [mailto:ffmpeg-devel-boun...@ffmpeg.org] On Behalf
>> Of Guo, Yejun
>> Sent: Monday, October 29, 2018 11:19 AM
>> To: ffmpeg-devel@ffmpeg.org
>> Subject: Re: [FFmpeg-devel] [PATCH V4] Add a filter implementing HDR
>> image generation from a single exposure using deep CNNs
>> 
>> any more comment? thanks.
>> 
>>> -----Original Message-----
>>> From: Guo, Yejun
>>> Sent: Tuesday, October 23, 2018 6:46 AM
>>> To: ffmpeg-devel@ffmpeg.org
>>> Cc: Guo, Yejun <yejun....@intel.com>; Guo
>>> Subject: [PATCH V4] Add a filter implementing HDR image generation
>>> from a single exposure using deep CNNs
>>> 
>>> see the algorithm's paper and code below.
>>> 
>>> the filter's parameter looks like:
>>> 
>> sdr2hdr=model_filename=/path_to_tensorflow_graph.pb:out_fmt=gbrp10l
>>> e
>>> 
>>> The input of the deep CNN model is RGB24 while the output is float for
>>> each color channel. This is the filter's default behavior to output
>>> format with gbrpf32le. And gbrp10le is also supported as the output,
>>> so we can see the rendering result in a player, as a reference.
>>> 
>>> To generate the model file, we need modify the original script a little.
>>> - set name='y' for y_final within script at
>>> https://github.com/gabrieleilertsen/hdrcnn/blob/master/network.py
>>> - add the following code to the script at
>>> https://github.com/gabrieleilertsen/hdrcnn/blob/master/hdrcnn_predict.
>>> py
>>> 
>>> graph = tf.graph_util.convert_variables_to_constants(sess,
>>> sess.graph_def,
>>> ["y"]) tf.train.write_graph(graph, '.', 'graph.pb', as_text=False)
>>> 
>>> The filter only works when tensorflow C api is supported in the
>>> system, native backend is not supported since there are some different
>>> types of layers in the deep CNN model, besides CONV and
>> DEPTH_TO_SPACE.
>>> 
>>> https://arxiv.org/pdf/1710.07480.pdf:
>>>  author       = "Eilertsen, Gabriel and Kronander, Joel, and Denes, Gyorgy
>> and
>>> Mantiuk, Rafał and Unger, Jonas",
>>>  title        = "HDR image reconstruction from a single exposure using deep
>>> CNNs",
>>>  journal      = "ACM Transactions on Graphics (TOG)",
>>>  number       = "6",
>>>  volume       = "36",
>>>  articleno    = "178",
>>>  year         = "2017"
>>> 
>>> https://github.com/gabrieleilertsen/hdrcnn
>>> 
>>> btw, as a whole solution, metadata should also be generated from the
>>> sdr video, so to be encoded as a HDR video. Not supported yet.
>>> This patch just focuses on this paper.
>>> 
>>> Signed-off-by: Guo, Yejun <yejun....@intel.com>
>>> ---
>>> configure                |   1 +
>>> doc/filters.texi         |  35 +++++++
>>> libavfilter/Makefile     |   1 +
>>> libavfilter/allfilters.c |   1 +
>>> libavfilter/vf_sdr2hdr.c | 268
>>> +++++++++++++++++++++++++++++++++++++++++++++++
>>> 5 files changed, 306 insertions(+)
>>> create mode 100644 libavfilter/vf_sdr2hdr.c
>>> 
>>> diff --git a/configure b/configure
>>> index 85d5dd5..5e2efba 100755
>>> --- a/configure
>>> +++ b/configure
>>> @@ -3438,6 +3438,7 @@ scale2ref_filter_deps="swscale"
>>> scale_filter_deps="swscale"
>>> scale_qsv_filter_deps="libmfx"
>>> select_filter_select="pixelutils"
>>> +sdr2hdr_filter_deps="libtensorflow"
>>> sharpness_vaapi_filter_deps="vaapi"
>>> showcqt_filter_deps="avcodec avformat swscale"
>>> showcqt_filter_suggest="libfontconfig libfreetype"
>>> diff --git a/doc/filters.texi b/doc/filters.texi index
>>> 17e2549..bba9f87 100644
>>> --- a/doc/filters.texi
>>> +++ b/doc/filters.texi
>>> @@ -14672,6 +14672,41 @@ Scale a subtitle stream (b) to match the main
>>> video (a) in size before overlayin  @end example  @end itemize
>>> 
>>> +@section sdr2hdr
>>> +
>>> +HDR image generation from a single exposure using deep CNNs with
>>> TensorFlow C library.
>>> +
>>> +@itemize
>>> +@item
>>> +paper:  see @url{https://arxiv.org/pdf/1710.07480.pdf}
>>> +
>>> +@item
>>> +code with model and trained parameters: see
>>> +@url{https://github.com/gabrieleilertsen/hdrcnn}
>>> +@end itemize
>>> +
>>> +The filter accepts the following options:
>>> +
>>> +@table @option
>>> +
>>> +@item model_filename
>>> +Set path to model file specifying network architecture and its parameters.
>>> +
>>> +@item out_fmt
>>> +the data format of the filter's output.
>>> +
>>> +It accepts the following values:
>>> +@table @samp
>>> +@item gbrpf32le
>>> +force gbrpf32le output
>>> +
>>> +@item gbrp10le
>>> +force gbrp10le output
>>> +@end table
>>> +
>>> +Default value is @samp{gbrpf32le}.
>>> +
>>> +@end table
>>> +
>>> @anchor{selectivecolor}
>>> @section selectivecolor
>>> 
>>> diff --git a/libavfilter/Makefile b/libavfilter/Makefile index
>>> 62cc2f5..88e7da6
>>> 100644
>>> --- a/libavfilter/Makefile
>>> +++ b/libavfilter/Makefile
>>> @@ -360,6 +360,7 @@ OBJS-$(CONFIG_SOBEL_OPENCL_FILTER)           +=
>>> vf_convolution_opencl.o opencl.o
>>> OBJS-$(CONFIG_SPLIT_FILTER)                  += split.o
>>> OBJS-$(CONFIG_SPP_FILTER)                    += vf_spp.o
>>> OBJS-$(CONFIG_SR_FILTER)                     += vf_sr.o
>>> +OBJS-$(CONFIG_SDR2HDR_FILTER)                += vf_sdr2hdr.o
>>> OBJS-$(CONFIG_SSIM_FILTER)                   += vf_ssim.o framesync.o
>>> OBJS-$(CONFIG_STEREO3D_FILTER)               += vf_stereo3d.o
>>> OBJS-$(CONFIG_STREAMSELECT_FILTER)           += f_streamselect.o
>>> framesync.o
>>> diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index
>>> 5e72803..1645c0f
>>> 100644
>>> --- a/libavfilter/allfilters.c
>>> +++ b/libavfilter/allfilters.c
>>> @@ -319,6 +319,7 @@ extern AVFilter ff_vf_scale_npp;  extern AVFilter
>>> ff_vf_scale_qsv;  extern AVFilter ff_vf_scale_vaapi;  extern AVFilter
>>> ff_vf_scale2ref;
>>> +extern AVFilter ff_vf_sdr2hdr;
>>> extern AVFilter ff_vf_select;
>>> extern AVFilter ff_vf_selectivecolor;  extern AVFilter ff_vf_sendcmd;
>>> diff --git a/libavfilter/vf_sdr2hdr.c b/libavfilter/vf_sdr2hdr.c new
>>> file mode
>>> 100644 index 0000000..109b907
>>> --- /dev/null
>>> +++ b/libavfilter/vf_sdr2hdr.c
>>> @@ -0,0 +1,268 @@
>>> +/*
>>> + * Copyright (c) 2018 Guo Yejun
>>> + *
>>> + * This file is part of FFmpeg.
>>> + *
>>> + * FFmpeg is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU Lesser General Public
>>> + * License as published by the Free Software Foundation; either
>>> + * version 2.1 of the License, or (at your option) any later version.
>>> + *
>>> + * FFmpeg is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> GNU
>>> + * Lesser General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU Lesser General Public
>>> + * License along with FFmpeg; if not, write to the Free Software
>>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
>>> +02110-1301 USA  */
>>> +
>>> +/**
>>> + * @file
>>> + * Filter implementing HDR image generation from a single exposure
>>> +using
>>> deep CNNs.
>>> + * https://arxiv.org/pdf/1710.07480.pdf
>>> + */
>>> +
>>> +#include "avfilter.h"
>>> +#include "formats.h"
>>> +#include "internal.h"
>>> +#include "libavutil/opt.h"
>>> +#include "libavutil/qsort.h"
>>> +#include "libavformat/avio.h"
>>> +#include "libswscale/swscale.h"
>>> +#include "dnn_interface.h"
>>> +#include <math.h>
>>> +
>>> +typedef struct SDR2HDRContext {
>>> +    const AVClass *class;
>>> +
>>> +    char* model_filename;
>>> +    enum AVPixelFormat out_fmt;
>>> +    DNNModule* dnn_module;
>>> +    DNNModel* model;
>>> +    DNNData input, output;
>>> +} SDR2HDRContext;
>>> +
>>> +#define OFFSET(x) offsetof(SDR2HDRContext, x) #define FLAGS
>>> +AV_OPT_FLAG_FILTERING_PARAM | AV_OPT_FLAG_VIDEO_PARAM
>> static
>>> const
>>> +AVOption sdr2hdr_options[] = {
>>> +    { "model_filename", "path to model file specifying network
>>> +architecture
>>> and its parameters", OFFSET(model_filename), AV_OPT_TYPE_STRING,
>>> {.str=NULL}, 0, 0, FLAGS },
>>> +    { "out_fmt", "the data format of the filter's output, it could be
>>> + gbrpf32le
>>> [default] or gbrp10le", OFFSET(out_fmt), AV_OPT_TYPE_PIXEL_FMT,
>>> {.i64=AV_PIX_FMT_GBRPF32LE}, AV_PIX_FMT_NONE, AV_PIX_FMT_NB,
>> FLAGS },
>>> +    { NULL }
>>> +};
>>> +
>>> +AVFILTER_DEFINE_CLASS(sdr2hdr);
>>> +
>>> +static av_cold int init(AVFilterContext* context) {
>>> +    SDR2HDRContext* ctx = context->priv;
>>> +
>>> +    if (ctx->out_fmt != AV_PIX_FMT_GBRPF32LE && ctx->out_fmt !=
>>> AV_PIX_FMT_GBRP10LE) {
>>> +        av_log(context, AV_LOG_ERROR, "could not support the output
>>> format\n");
>>> +        return AVERROR(ENOSYS);
>>> +    }
>>> +
>>> +    ctx->dnn_module = ff_get_dnn_module(DNN_TF);
>>> +    if (!ctx->dnn_module){
>>> +        av_log(context, AV_LOG_ERROR, "could not create DNN module
>>> + for
>>> tensorflow backend\n");
>>> +        return AVERROR(ENOMEM);
>>> +    }
>>> +    if (!ctx->model_filename){
>>> +        av_log(context, AV_LOG_ERROR, "model file for network was not
>>> specified\n");
>>> +        return AVERROR(EIO);
>>> +    }
>>> +    if (!ctx->dnn_module->load_model) {
>>> +        av_log(context, AV_LOG_ERROR, "load_model for network was not
>>> specified\n");
>>> +        return AVERROR(EIO);
>>> +    }
>>> +    ctx->model = (ctx->dnn_module->load_model)(ctx->model_filename);
>>> +    if (!ctx->model){
>>> +        av_log(context, AV_LOG_ERROR, "could not load DNN model\n");
>>> +        return AVERROR(EIO);
>>> +    }
>>> +    return 0;
>>> +}
>>> +
>>> +static int query_formats(AVFilterContext* context) {
>>> +    const enum AVPixelFormat in_formats[] = {AV_PIX_FMT_RGB24,
>>> +                                             AV_PIX_FMT_NONE};
>>> +    enum AVPixelFormat out_formats[2];
>>> +    SDR2HDRContext* ctx = context->priv;
>>> +    AVFilterFormats* formats_list;
>>> +    int ret = 0;
>>> +
>>> +    formats_list = ff_make_format_list(in_formats);
>>> +    if ((ret = ff_formats_ref(formats_list,
>>> + &context->inputs[0]->out_formats))
>>> < 0)
>>> +        return ret;
>>> +
>>> +    out_formats[0] = ctx->out_fmt;
>>> +    out_formats[1] = AV_PIX_FMT_NONE;
>>> +    formats_list = ff_make_format_list(out_formats);
>>> +    if ((ret = ff_formats_ref(formats_list,
>>> + &context->outputs[0]->in_formats))
>>> < 0)
>>> +        return ret;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static int config_props(AVFilterLink* inlink) {
>>> +    AVFilterContext* context = inlink->dst;
>>> +    SDR2HDRContext* ctx = context->priv;
>>> +    AVFilterLink* outlink = context->outputs[0];
>>> +    DNNReturnType result;
>>> +
>>> +    // the dnn model is tied with resolution due to deconv layer of
>> tensorflow
>>> +    // now just support 1920*1080 and so the magic numbers within this file
>>> +    if (inlink->w != 1920 || inlink->h != 1080) {
>>> +        av_log(context, AV_LOG_ERROR, "only support frame size with
>>> 1920*1080\n");
>>> +        return AVERROR(ENOSYS);
>>> +     }
>>> +
>>> +    ctx->input.width = 1920;
>>> +    ctx->input.height = 1088;  //the model requires height is a multiple 
>>> of 32,
>>> +    ctx->input.channels = 3;
>>> +
>>> +    result = (ctx->model->set_input_output)(ctx->model->model, &ctx-
>>>> input, &ctx->output);
>>> +    if (result != DNN_SUCCESS){
>>> +        av_log(context, AV_LOG_ERROR, "could not set input and output
>>> + for
>>> the model\n");
>>> +        return AVERROR(EIO);
>>> +    }
>>> +
>>> +    memset(ctx->input.data, 0, ctx->input.channels * ctx->input.width
>>> + * ctx-
>>>> input.height * sizeof(float));
>>> +    outlink->h = 1080;
>>> +    outlink->w = 1920;
>>> +    return 0;
>>> +}
>>> +
>>> +static float qsort_comparison_function_float(const void *a, const
>>> +void
>>> +*b) {
>>> +    return *(const float *)a - *(const float *)b; }
>>> +
>>> +static int filter_frame(AVFilterLink* inlink, AVFrame* in) {
>>> +    DNNReturnType dnn_result = DNN_SUCCESS;
>>> +    AVFilterContext* context = inlink->dst;
>>> +    SDR2HDRContext* ctx = context->priv;
>>> +    AVFilterLink* outlink = context->outputs[0];
>>> +    AVFrame* out = ff_get_video_buffer(outlink, outlink->w, outlink->h);
>>> +    int total_pixels = in->height * in->width;
>>> +
>>> +    if (!out){
>>> +        av_log(context, AV_LOG_ERROR, "could not allocate memory for
>>> output frame\n");
>>> +        av_frame_free(&in);
>>> +        return AVERROR(ENOMEM);
>>> +    }
>>> +
>>> +    av_frame_copy_props(out, in);
>>> +
>>> +    for (int i = 0; i < in->linesize[0] * in->height; ++i) {
>>> +        ctx->input.data[i] = in->data[0][i] / 255.0f;
>>> +    }
>>> +
>>> +    dnn_result = (ctx->dnn_module->execute_model)(ctx->model);
>>> +    if (dnn_result != DNN_SUCCESS){
>>> +        av_log(context, AV_LOG_ERROR, "failed to execute loaded
>> model\n");
>>> +        return AVERROR(EIO);
>>> +    }
>>> +
>>> +    if (ctx->out_fmt == AV_PIX_FMT_GBRPF32LE) {
>>> +        float* outg = (float*)out->data[0];
>>> +        float* outb = (float*)out->data[1];
>>> +        float* outr = (float*)out->data[2];
>>> +        for (int i = 0; i < total_pixels; ++i) {
>>> +            float r = ctx->output.data[i*3];
>>> +            float g = ctx->output.data[i*3+1];
>>> +            float b = ctx->output.data[i*3+2];
>>> +            outr[i] = r;
>>> +            outg[i] = g;
>>> +            outb[i] = b;
>>> +        }
>>> +    } else {
>>> +        // here, we just use a rough mapping to the 10bit contents
>>> +        // meta data generation for HDR video encoding is not supported yet
>>> +        float* converted_data = (float*)av_malloc(total_pixels * 3 *
>>> sizeof(float));
>>> +        int16_t* outg = (int16_t*)out->data[0];
>>> +        int16_t* outb = (int16_t*)out->data[1];
>>> +        int16_t* outr = (int16_t*)out->data[2];
>>> +
>>> +        float max = 1.0f;
>>> +        for (int i = 0; i < total_pixels * 3; ++i) {
>>> +            float d = ctx->output.data[i];
>>> +            d = sqrt(d);
>>> +            converted_data[i] = d;
>>> +            max = FFMAX(d, max);
>>> +        }
>>> +
>>> +        if (max > 1.0f) {
>>> +            AV_QSORT(converted_data, total_pixels * 3, float,
>>> qsort_comparison_function_float);
>>> +            // 0.5% pixels are clipped
>>> +            max = converted_data[(int)(total_pixels * 3 * 0.995)];
>>> +            max = FFMAX(max, 1.0f);
>>> +
>>> +            for (int i = 0; i < total_pixels * 3; ++i) {
>>> +                float d = ctx->output.data[i];
>>> +                d = sqrt(d);
>>> +                d = FFMIN(d, max);
>>> +                converted_data[i] = d;
>>> +            }
>>> +        }
>>> +
>>> +        for (int i = 0; i < total_pixels; ++i) {
>>> +            float r = converted_data[i*3];
>>> +            float g = converted_data[i*3+1];
>>> +            float b = converted_data[i*3+2];
>>> +            outr[i] = r / max * 1023;
>>> +            outg[i] = g / max * 1023;
>>> +            outb[i] = b / max * 1023;
>>> +        }
>>> +
>>> +        av_free(converted_data);
>>> +    }
>>> +
>>> +    av_frame_free(&in);
>>> +    return ff_filter_frame(outlink, out); }
>>> +
>>> +static av_cold void uninit(AVFilterContext* context) {
>>> +    SDR2HDRContext* ctx = context->priv;
>>> +
>>> +    if (ctx->dnn_module){
>>> +        (ctx->dnn_module->free_model)(&ctx->model);
>>> +        av_freep(&ctx->dnn_module);
>>> +    }
>>> +}
>>> +
>>> +static const AVFilterPad sdr2hdr_inputs[] = {
>>> +    {
>>> +        .name         = "default",
>>> +        .type         = AVMEDIA_TYPE_VIDEO,
>>> +        .config_props = config_props,
>>> +        .filter_frame = filter_frame,
>>> +    },
>>> +    { NULL }
>>> +};
>>> +
>>> +static const AVFilterPad sdr2hdr_outputs[] = {
>>> +    {
>>> +        .name = "default",
>>> +        .type = AVMEDIA_TYPE_VIDEO,
>>> +    },
>>> +    { NULL }
>>> +};
>>> +
>>> +AVFilter ff_vf_sdr2hdr = {
>>> +    .name          = "sdr2hdr",
>>> +    .description   = NULL_IF_CONFIG_SMALL("HDR image generation from a
>>> single exposure using deep CNNs."),
>>> +    .priv_size     = sizeof(SDR2HDRContext),
>>> +    .init          = init,
>>> +    .uninit        = uninit,
>>> +    .query_formats = query_formats,
>>> +    .inputs        = sdr2hdr_inputs,
>>> +    .outputs       = sdr2hdr_outputs,
>>> +    .priv_class    = &sdr2hdr_class,
>>> +    .flags         = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC,
>>> +};
>>> --
>>> 2.7.4
>> 
>> _______________________________________________
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH V4] Add a filter implementing HDR image generation from a single exposure using deep CNNs

Reply via email to