> -----Original Message----- > From: Paul B Mahol [mailto:one...@gmail.com] > Sent: Monday, October 28, 2019 4:00 PM > To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Cc: Guo, Yejun <yejun....@intel.com> > Subject: Re: [FFmpeg-devel] [PATCH V2 4/4] avfilter/vf_dnn_processing: add a > generic filter for image proccessing with dnn networks > > On 10/21/19, Guo, Yejun <yejun....@intel.com> wrote: > > This filter accepts all the dnn networks which do image processing. > > Currently, frame with formats rgb24 and bgr24 are supported. Other > > formats such as gray and YUV will be supported next. The dnn network > > can accept data in float32 or uint8 format. And the dnn network can > > change frame size. > > > > Let's take an example with the following python script. This script > > halves the value of the first channel of the pixel. > > import tensorflow as tf > > import numpy as np > > import scipy.misc > > in_img = scipy.misc.imread('in.bmp') > > in_img = in_img.astype(np.float32)/255.0 > > in_data = in_img[np.newaxis, :] > > filter_data = np.array([0.5, 0, 0, 0, 1., 0, 0, 0, > > 1.]).reshape(1,1,3,3).astype(np.float32) > > filter = tf.Variable(filter_data) > > x = tf.placeholder(tf.float32, shape=[1, None, None, 3], name='dnn_in') > > y = tf.nn.conv2d(x, filter, strides=[1, 1, 1, 1], padding='VALID', > > name='dnn_out') > > sess=tf.Session() > > sess.run(tf.global_variables_initializer()) > > output = sess.run(y, feed_dict={x: in_data}) > > graph_def = tf.graph_util.convert_variables_to_constants(sess, > > sess.graph_def, ['dnn_out']) > > tf.train.write_graph(graph_def, '.', 'halve_first_channel.pb', > > as_text=False) > > output = output * 255.0 > > output = output.astype(np.uint8) > > scipy.misc.imsave("out.bmp", np.squeeze(output)) > > So this one executes python code without ever returning or using AVFrame* ? > This is extremely limited usage.
the purpose of this script is to demo how to setup and execute dnn models with python+tensorflow. The only relationship with ffmpeg is that the script prepares the model file halve_first_channel.pb. The next description shows how ffmpeg can execute the model in a filter. I'll try to update the commit log to avoid misleading words, thanks. > > > > > - generate halve_first_channel.pb with the above script > > - generate halve_first_channel.model with tools/python/convert.py > > - try with following commands > > ./ffmpeg -i input.jpg -vf > > > dnn_processing=model=halve_first_channel.model:input=dnn_in:output=dnn_ > out:fmt=rgb24:dnn_backend=native > > -y out.native.png > > ./ffmpeg -i input.jpg -vf > > > dnn_processing=model=halve_first_channel.pb:input=dnn_in:output=dnn_out:f > mt=rgb24:dnn_backend=tensorflow > > -y out.tf.png > > > > Signed-off-by: Guo, Yejun <yejun....@intel.com> > > --- > > configure | 1 + > > doc/filters.texi | 44 ++++++ > > libavfilter/Makefile | 1 + > > libavfilter/allfilters.c | 1 + > > libavfilter/vf_dnn_processing.c | 333 > > ++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 380 insertions(+) > > create mode 100644 libavfilter/vf_dnn_processing.c > > > > diff --git a/configure b/configure > > index 8413826..bf2bac9 100755 > > --- a/configure > > +++ b/configure > > @@ -3460,6 +3460,7 @@ derain_filter_select="dnn" > > deshake_filter_select="pixelutils" > > deshake_opencl_filter_deps="opencl" > > dilation_opencl_filter_deps="opencl" > > +dnn_processing_filter_select="dnn" > > drawtext_filter_deps="libfreetype" > > drawtext_filter_suggest="libfontconfig libfribidi" > > elbg_filter_deps="avcodec" > > diff --git a/doc/filters.texi b/doc/filters.texi > > index bdc4136..c11a616 100644 > > --- a/doc/filters.texi > > +++ b/doc/filters.texi > > @@ -8928,6 +8928,50 @@ ffmpeg -i INPUT -f lavfi -i > > nullsrc=hd720,geq='r=128+80*(sin(sqrt((X-W/2)*(X-W/2 > > @end example > > @end itemize > > > > +@section dnn_processing > > + > > +Do image processing with deep neural networks. Currently only AVFrame > with > > RGB24 > > +and BGR24 are supported, more formats will be added later. > > + > > +The filter accepts the following options: > > + > > +@table @option > > +@item dnn_backend > > +Specify which DNN backend to use for model loading and execution. This > > option accepts > > +the following values: > > + > > +@table @samp > > +@item native > > +Native implementation of DNN loading and execution. > > + > > +@item tensorflow > > +TensorFlow backend. To enable this backend you > > +need to install the TensorFlow for C library (see > > +@url{https://www.tensorflow.org/install/install_c}) and configure FFmpeg > > with > > +@code{--enable-libtensorflow} > > +@end table > > + > > +Default value is @samp{native}. > > + > > +@item model > > +Set path to model file specifying network architecture and its parameters. > > +Note that different backends use different file formats. TensorFlow and > > native > > +backend can load files for only its format. > > + > > +Native model file (.model) can be generated from TensorFlow model file > > (.pb) by using tools/python/convert.py > > + > > +@item input > > +Set the input name of the dnn network. > > + > > +@item output > > +Set the output name of the dnn network. > > + > > +@item fmt > > +Set the pixel format for the Frame. Allowed values are > > @code{AV_PIX_FMT_RGB24}, and @code{AV_PIX_FMT_BGR24}. > > +Default value is @code{AV_PIX_FMT_RGB24}. > > + > > +@end table > > + > > @section drawbox > > > > Draw a colored box on the input image. > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > > index 63d2fba..47a485a 100644 > > --- a/libavfilter/Makefile > > +++ b/libavfilter/Makefile > > @@ -224,6 +224,7 @@ OBJS-$(CONFIG_DILATION_OPENCL_FILTER) > += > > vf_neighbor_opencl.o opencl.o \ > > > opencl/neighbor.o > > OBJS-$(CONFIG_DISPLACE_FILTER) += vf_displace.o > framesync.o > > OBJS-$(CONFIG_DOUBLEWEAVE_FILTER) += vf_weave.o > > +OBJS-$(CONFIG_DNN_PROCESSING_FILTER) += > vf_dnn_processing.o > > OBJS-$(CONFIG_DRAWBOX_FILTER) += vf_drawbox.o > > OBJS-$(CONFIG_DRAWGRAPH_FILTER) += f_drawgraph.o > > OBJS-$(CONFIG_DRAWGRID_FILTER) += vf_drawbox.o > > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > > index e4186f9..485409f 100644 > > --- a/libavfilter/allfilters.c > > +++ b/libavfilter/allfilters.c > > @@ -209,6 +209,7 @@ extern AVFilter ff_vf_detelecine; > > extern AVFilter ff_vf_dilation; > > extern AVFilter ff_vf_dilation_opencl; > > extern AVFilter ff_vf_displace; > > +extern AVFilter ff_vf_dnn_processing; > > extern AVFilter ff_vf_doubleweave; > > extern AVFilter ff_vf_drawbox; > > extern AVFilter ff_vf_drawgraph; > > diff --git a/libavfilter/vf_dnn_processing.c > > b/libavfilter/vf_dnn_processing.c > > new file mode 100644 > > index 0000000..de89af4 > > --- /dev/null > > +++ b/libavfilter/vf_dnn_processing.c > > @@ -0,0 +1,333 @@ > > +/* > > + * Copyright (c) 2019 Guo Yejun > > + * > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with FFmpeg; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > > USA > > + */ > > + > > +/** > > + * @file > > + * implementing a generic image processing filter using deep learning > > networks. > > + */ > > + > > +#include "libavformat/avio.h" > > +#include "libavutil/opt.h" > > +#include "libavutil/pixdesc.h" > > +#include "libavutil/avassert.h" > > +#include "avfilter.h" > > +#include "dnn_interface.h" > > +#include "formats.h" > > +#include "internal.h" > > + > > +typedef struct DnnProcessingContext { > > + const AVClass *class; > > + > > + char *model_filename; > > + DNNBackendType backend_type; > > + enum AVPixelFormat fmt; > > This should be int. could you please help to explain a bit more why 'enum AVPixelFormat' should be int. I searched 'AV_OPT_TYPE_PIXEL_FMT' in vf_* files and found 'enum AVPixelFormat' is used in vf_mergeplanes.c, vf_program_opencl.c and vf_tonemap_opencl.c. > > > + char *model_inputname; > > + char *model_outputname; > > + > > + DNNModule *dnn_module; > > + DNNModel *model; > > + > > + // input & output of the model at execution time > > + DNNData input; > > + DNNData output; > > +} DnnProcessingContext; > > + > > +#define OFFSET(x) offsetof(DnnProcessingContext, x) > > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM | > AV_OPT_FLAG_VIDEO_PARAM > > +static const AVOption dnn_processing_options[] = { > > + { "dnn_backend", "DNN backend", > OFFSET(backend_type), > > AV_OPT_TYPE_INT, { .i64 = 0 }, 0, 1, FLAGS, "backend" }, > > + { "native", "native backend flag", 0, > > AV_OPT_TYPE_CONST, { .i64 = 0 }, 0, 0, FLAGS, "backend" }, > > +#if (CONFIG_LIBTENSORFLOW == 1) > > + { "tensorflow", "tensorflow backend flag", 0, > > AV_OPT_TYPE_CONST, { .i64 = 1 }, 0, 0, FLAGS, "backend" }, > > +#endif > > + { "model", "path to model file", > OFFSET(model_filename), > > AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, > > + { "input", "input name of the model", > OFFSET(model_inputname), > > AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, FLAGS }, > > + { "output", "output name of the model", > > OFFSET(model_outputname), AV_OPT_TYPE_STRING, { .str = NULL }, 0, 0, > > FLAGS }, > > + { "fmt", "AVPixelFormat of the frame", OFFSET(fmt), > > AV_OPT_TYPE_PIXEL_FMT, { .i64=AV_PIX_FMT_RGB24 }, > AV_PIX_FMT_NONE, > > AV_PIX_FMT_NB - 1, FLAGS }, > > + { NULL } > > +}; > > + > > +AVFILTER_DEFINE_CLASS(dnn_processing); > > + > > +static av_cold int init(AVFilterContext *context) > > +{ > > + DnnProcessingContext *ctx = context->priv; > > + int supported = 0; > > + // as the first step, only rgb24 and bgr24 are supported > > + const enum AVPixelFormat supported_pixel_fmts[] = { > > + AV_PIX_FMT_RGB24, > > + AV_PIX_FMT_BGR24, > > + }; > > + for (int i = 0; i < sizeof(supported_pixel_fmts) / sizeof(enum > > AVPixelFormat); ++i) { > > + if (supported_pixel_fmts[i] == ctx->fmt) { > > + supported = 1; > > + break; > > + } > > + } > > + if (!supported) { > > + av_log(context, AV_LOG_ERROR, "pixel fmt %s not supported > yet\n", > > + > av_get_pix_fmt_name(ctx->fmt)); > > + return AVERROR(AVERROR_INVALIDDATA); > > + } > > + > > + if (!ctx->model_filename) { > > + av_log(ctx, AV_LOG_ERROR, "model file for network is not > > specified\n"); > > + return AVERROR(EINVAL); > > + } > > + if (!ctx->model_inputname) { > > + av_log(ctx, AV_LOG_ERROR, "intput name of the model network is > not > > Typo thanks, will fix. > > > specified\n"); > > + return AVERROR(EINVAL); > > + } > > + if (!ctx->model_outputname) { > > + av_log(ctx, AV_LOG_ERROR, "output name of the model network > is not > > specified\n"); > > + return AVERROR(EINVAL); > > + } > > + > > + ctx->dnn_module = ff_get_dnn_module(ctx->backend_type); > > + if (!ctx->dnn_module) { > > + av_log(ctx, AV_LOG_ERROR, "could not create DNN module for > > requested backend\n"); > > + return AVERROR(ENOMEM); > > + } > > + if (!ctx->dnn_module->load_model) { > > + av_log(ctx, AV_LOG_ERROR, "load_model for network is not > > specified\n"); > > + return AVERROR(EINVAL); > > + } > > + > > + ctx->model = (ctx->dnn_module->load_model)(ctx->model_filename); > > + if (!ctx->model) { > > + av_log(ctx, AV_LOG_ERROR, "could not load DNN model\n"); > > + return AVERROR(EINVAL); > > + } > > + > > + return 0; > > +} > > + > > +static int query_formats(AVFilterContext *context) > > +{ > > + AVFilterFormats *formats; > > + DnnProcessingContext *ctx = context->priv; > > + enum AVPixelFormat pixel_fmts[2]; > > + pixel_fmts[0] = ctx->fmt; > > + pixel_fmts[1] = AV_PIX_FMT_NONE; > > + > > + formats = ff_make_format_list(pixel_fmts); > > + return ff_set_common_formats(context, formats); > > +} > > + > > +static int config_input(AVFilterLink *inlink) > > +{ > > + AVFilterContext *context = inlink->dst; > > + DnnProcessingContext *ctx = context->priv; > > + DNNReturnType result; > > + DNNData dnn_data; > > + > > + result = ctx->model->get_input(ctx->model->model, &dnn_data, > > ctx->model_inputname); > > + if (result != DNN_SUCCESS) { > > + av_log(ctx, AV_LOG_ERROR, "could not get input from the > model\n"); > > + return AVERROR(EIO); > > + } > > + > > + // the design is to add explicit scale filter before this filter > > + if (dnn_data.height != -1 && dnn_data.height != inlink->h) { > > + av_log(ctx, AV_LOG_ERROR, "the model requires frame height %d > but > > got %d\n", > > + dnn_data.height, inlink->h); > > + return AVERROR(EIO); > > + } > > + if (dnn_data.width != -1 && dnn_data.width != inlink->w) { > > + av_log(ctx, AV_LOG_ERROR, "the model requires frame width %d > but > > got %d\n", > > + dnn_data.width, inlink->w); > > + return AVERROR(EIO); > > + } > > + > > + if (dnn_data.channels != 3) { > > + av_log(ctx, AV_LOG_ERROR, "the model requires input > channels %d\n", > > + dnn_data.channels); > > + return AVERROR(EIO); > > + } > > + if (dnn_data.dt != DNN_FLOAT && dnn_data.dt != DNN_UINT8) { > > + av_log(ctx, AV_LOG_ERROR, "only support dnn models with input > data > > type as float32 and uint8.\n"); > > + return AVERROR(EIO); > > + } > > + > > + ctx->input.width = inlink->w; > > + ctx->input.height = inlink->h; > > + ctx->input.channels = dnn_data.channels; > > + ctx->input.dt = dnn_data.dt; > > + > > + result = (ctx->model->set_input_output)(ctx->model->model, > > + &ctx->input, > ctx->model_inputname, > > + (const char > > **)&ctx->model_outputname, 1); > > + if (result != DNN_SUCCESS) { > > + av_log(ctx, AV_LOG_ERROR, "could not set input and output for > the > > model\n"); > > + return AVERROR(EIO); > > + } > > + > > + return 0; > > +} > > + > > +static int config_output(AVFilterLink *outlink) > > +{ > > + AVFilterContext *context = outlink->src; > > + DnnProcessingContext *ctx = context->priv; > > + DNNReturnType result; > > + > > + // have a try run in case that the dnn model resize the frame > > + result = (ctx->dnn_module->execute_model)(ctx->model, &ctx->output, > 1); > > + if (result != DNN_SUCCESS){ > > + av_log(ctx, AV_LOG_ERROR, "failed to execute model\n"); > > + return AVERROR(EIO); > > + } > > + > > + outlink->w = ctx->output.width; > > + outlink->h = ctx->output.height; > > + > > + return 0; > > +} > > + > > +static int copy_from_frame_to_dnn(DNNData *dnn_data, const AVFrame > *in) > > +{ > > + // extend this function to support more formats > > + av_assert0(in->format == AV_PIX_FMT_RGB24 || in->format == > > AV_PIX_FMT_RGB24); > > + > > + if (dnn_data->dt == DNN_FLOAT) { > > + float *dnn_input = dnn_data->data; > > + for (int i = 0; i < in->height; i++) { > > + for(int j = 0; j < in->width * 3; j++) { > > + int k = i * in->linesize[0] + j; > > + int t = i * in->width * 3 + j; > > + dnn_input[t] = in->data[0][k] / 255.0f; > > + } > > + } > > + } else { > > + uint8_t *dnn_input = dnn_data->data; > > + av_assert0(dnn_data->dt == DNN_UINT8); > > + for (int i = 0; i < in->height; i++) { > > + for(int j = 0; j < in->width * 3; j++) { > > + int k = i * in->linesize[0] + j; > > + int t = i * in->width * 3 + j; > > + dnn_input[t] = in->data[0][k]; > > + } > > + } > > + } > > + > > + return 0; > > +} > > + > > +static int copy_from_dnn_to_frame(AVFrame *out, const DNNData > *dnn_data) > > +{ > > + // extend this function to support more formats > > + av_assert0(out->format == AV_PIX_FMT_RGB24 || out->format == > > AV_PIX_FMT_RGB24); > > + > > + if (dnn_data->dt == DNN_FLOAT) { > > + float *dnn_output = dnn_data->data; > > + for (int i = 0; i < out->height; i++) { > > + for(int j = 0; j < out->width * 3; j++) { > > + int k = i * out->linesize[0] + j; > > + int t = i * out->width * 3 + j; > > + out->data[0][k] = av_clip((int)(dnn_output[t] * 255.0f), 0, > > 255); > > + } > > + } > > + } else { > > + uint8_t *dnn_output = dnn_data->data; > > + av_assert0(dnn_data->dt == DNN_UINT8); > > + for (int i = 0; i < out->height; i++) { > > + for(int j = 0; j < out->width * 3; j++) { > > + int k = i * out->linesize[0] + j; > > + int t = i * out->width * 3 + j; > > + out->data[0][k] = dnn_output[t]; > > + } > > + } > > + } > > + > > + return 0; > > +} > > + > > +static int filter_frame(AVFilterLink *inlink, AVFrame *in) > > +{ > > + AVFilterContext *context = inlink->dst; > > + AVFilterLink *outlink = context->outputs[0]; > > + DnnProcessingContext *ctx = context->priv; > > + DNNReturnType dnn_result; > > + AVFrame *out; > > + > > + copy_from_frame_to_dnn(&ctx->input, in); > > + > > + dnn_result = (ctx->dnn_module->execute_model)(ctx->model, > &ctx->output, > > 1); > > + if (dnn_result != DNN_SUCCESS){ > > + av_log(ctx, AV_LOG_ERROR, "failed to execute model\n"); > > + av_frame_free(&in); > > + return AVERROR(EIO); > > + } > > + av_assert0(ctx->output.channels == 3); > > + > > + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); > > + if (!out) { > > + av_log(ctx, AV_LOG_ERROR, "could not allocate memory for > output > > frame\n"); > > This log message should be removed, as it is not useful at all. thanks, will remove > > > + av_frame_free(&in); > > + return AVERROR(ENOMEM); > > + } > > + > > + av_frame_copy_props(out, in); > > + copy_from_dnn_to_frame(out, &ctx->output); > > + av_frame_free(&in); > > + return ff_filter_frame(outlink, out); > > +} > > + > > +static av_cold void uninit(AVFilterContext *ctx) > > +{ > > + DnnProcessingContext *context = ctx->priv; > > + > > + if (context->dnn_module) > > + (context->dnn_module->free_model)(&context->model); > > + > > + av_freep(&context->dnn_module); > > +} > > + > > +static const AVFilterPad dnn_processing_inputs[] = { > > + { > > + .name = "default", > > + .type = AVMEDIA_TYPE_VIDEO, > > + .config_props = config_input, > > + .filter_frame = filter_frame, > > + }, > > + { NULL } > > +}; > > + > > +static const AVFilterPad dnn_processing_outputs[] = { > > + { > > + .name = "default", > > + .type = AVMEDIA_TYPE_VIDEO, > > + .config_props = config_output, > > + }, > > + { NULL } > > +}; > > + > > +AVFilter ff_vf_dnn_processing = { > > + .name = "dnn_processing", > > + .description = NULL_IF_CONFIG_SMALL("Apply DNN processing > filter to > > the input."), > > + .priv_size = sizeof(DnnProcessingContext), > > + .init = init, > > + .uninit = uninit, > > + .query_formats = query_formats, > > + .inputs = dnn_processing_inputs, > > + .outputs = dnn_processing_outputs, > > + .priv_class = &dnn_processing_class, > > + .flags = AVFILTER_FLAG_SUPPORT_TIMELINE_GENERIC, > > If filter changes w/h, this can not be supported. got it, will remove this line, thank you. > > > +}; > > -- > > 2.7.4 > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".