What is this? Missing documentation. NAK
On 6/27/20, hanishkvc <hanish...@gmail.com> wrote: > v02-20200627IST2331 > > Unrolled Intel Legacy Tile-Y detiling logic. > > Also a consolidated patch file, instead of the previous development > flow based multiple patch files. > > v01-20200627IST1308 > > Implemented Intel Legacy Tile-X and Tile-Y detiling logic > > NOTES: > > This video filter allows framebuffers which are tiled to be detiled > using logic running on the cpu, into a linear layout. > > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling. > THis should help one to work with frames captured (say using kmsgrab) > on laptops having Intel GPU. > > Tile-X conversion logic has been explicitly cross checked, with Tile-X > based frames. However Tile-Y conv logic hasnt been tested with Tile-Y > based frames, but it should potentially do the job, based on my current > understanding of the Tile-Y layout format. > > TODO1: At a later time have to generate Tile-Y based frames, and then > cross check the corresponding logic explicitly. > > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the > layout conversion. But some online discussions from sometime back seem > to indicate that this path is not fully bug free currently. > --- > Changelog | 1 + > doc/filters.texi | 62 ++++++++ > libavfilter/Makefile | 1 + > libavfilter/allfilters.c | 1 + > libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++ > 5 files changed, 374 insertions(+) > create mode 100644 libavfilter/vf_fbdetile.c > > diff --git a/Changelog b/Changelog > index a60e7d2eb8..0e03491f6a 100644 > --- a/Changelog > +++ b/Changelog > @@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to youngest > within each release, > releases are sorted from youngest to oldest. > > version <next>: > +- fbdetile cpu based framebuffer layout detiling video filter > - AudioToolbox output device > - MacCaption demuxer > > diff --git a/doc/filters.texi b/doc/filters.texi > index 3c2dd2eb90..73ba21af89 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -12210,6 +12210,68 @@ It accepts the following optional parameters: > The number of the CUDA device to use > @end table > > +@anchor{fbdetile} > +@section fbdetile > + > +Detiles the Framebuffer tile layout into a linear layout using CPU. > + > +It currently supports conversion from Intel legacy tile-x and tile-y > layouts > +into a linear layout. This is useful if one is using kmsgrab and hwdownload > +to capture a screen which is using one of these non-linear layouts. > + > +Currently it expects the data to be a 32bit RGB based pixel format. However > +the logic doesnt do any pixel format conversion or so. Later will be > enabling > +16bit RGB data also, as the logic is transparent to it at one level. > + > +One could either insert this into the filter chain while capturing itself, > +or else, if it is slowing things down or so, then one could instead insert > +it into the filter chain during playback or transcoding or so. > + > +It supports the following optional parameters > + > +@table @option > +@item type > +Specify which detiling conversion to apply. The supported values are > +@table @var > +@item 0 > +intel tile-x to linear conversion (the default) > +@item 1 > +intel tile-y to linear conversion. > +@end table > +@end table > + > +If one wants to convert during capture itself, one could do > +@example > +ffmpeg -f kmsgrab -i - -vf "hwdownload, fbdetile" OUTPUT > +@end example > + > +However if one wants to convert after the tiled data has been already > captured > +@example > +ffmpeg -i INPUT -vf "fbdetile" OUTPUT > +@end example > +@example > +ffplay -i INPUT -vf "fbdetile" > +@end example > + > +NOTE: While transcoding a test 1080p h264 stream, with 276 frames, with two > +runs of each situation, the performance was has given below. However this > +was for the older | initial version of the logic, as well as it was run on > +the default linux chromebook->vm->container, so the perf values need not be > +proper. But in a relative sense the overhead would be similar. > +@example > +rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4 > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=0 out.mp4 > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=1 out.mp4 > +@end example > +@table @option > +@item with no fbdetile filter > +it took ~7.28 secs, > +@item with fbdetile=0 filter > +it took ~8.69 secs, > +@item with fbdetile=1 filter > +it took ~9.20 secs. > +@end table > + > @section hqx > > Apply a high-quality magnification filter designed for pixel art. This > filter > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > index 5123540653..bdb0c379ae 100644 > --- a/libavfilter/Makefile > +++ b/libavfilter/Makefile > @@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER) += > vf_hwdownload.o > OBJS-$(CONFIG_HWMAP_FILTER) += vf_hwmap.o > OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER) += vf_hwupload_cuda.o > OBJS-$(CONFIG_HWUPLOAD_FILTER) += vf_hwupload.o > +OBJS-$(CONFIG_FBDETILE_FILTER) += vf_fbdetile.o > OBJS-$(CONFIG_HYSTERESIS_FILTER) += vf_hysteresis.o framesync.o > OBJS-$(CONFIG_IDET_FILTER) += vf_idet.o > OBJS-$(CONFIG_IL_FILTER) += vf_il.o > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > index 1183e40267..f8dceb2a88 100644 > --- a/libavfilter/allfilters.c > +++ b/libavfilter/allfilters.c > @@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload; > extern AVFilter ff_vf_hwmap; > extern AVFilter ff_vf_hwupload; > extern AVFilter ff_vf_hwupload_cuda; > +extern AVFilter ff_vf_fbdetile; > extern AVFilter ff_vf_hysteresis; > extern AVFilter ff_vf_idet; > extern AVFilter ff_vf_il; > diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c > new file mode 100644 > index 0000000000..8b20c96d2c > --- /dev/null > +++ b/libavfilter/vf_fbdetile.c > @@ -0,0 +1,309 @@ > +/* > + * Copyright (c) 2020 HanishKVC > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +/** > + * @file > + * Detile the Frame buffer's tile layout using the cpu > + * Currently it supports the legacy Intel Tile X layout detiling. > + * > + */ > + > +/* > + * ToThink|Check: Optimisations > + * > + * Does gcc setting used by ffmpeg allows memcpy | stringops inlining, > + * loop unrolling, better native matching instructions, additional > + * optimisations, ... > + * > + * Does gcc map to optimal memcpy logic, based on the situation it is > + * used in. > + * > + * If not, may be look at vector_size or intrinsics or appropriate arch > + * and cpu specific inline asm or ... > + * > + */ > + > +#include "libavutil/avassert.h" > +#include "libavutil/imgutils.h" > +#include "libavutil/opt.h" > +#include "avfilter.h" > +#include "formats.h" > +#include "internal.h" > +#include "video.h" > + > +enum FilterMode { > + TYPE_INTELX, > + TYPE_INTELY, > + NB_TYPE > +}; > + > +typedef struct FBDetileContext { > + const AVClass *class; > + int width, height; > + int type; > +} FBDetileContext; > + > +#define OFFSET(x) offsetof(FBDetileContext, x) > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM > +static const AVOption fbdetile_options[] = { > + { "type", "set framebuffer format_modifier type", OFFSET(type), > AV_OPT_TYPE_INT, {.i64=TYPE_INTELX}, 0, NB_TYPE-1, FLAGS, "type" }, > + { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST, > {.i64=TYPE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" }, > + { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST, > {.i64=TYPE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" }, > + { NULL } > +}; > + > +AVFILTER_DEFINE_CLASS(fbdetile); > + > +static av_cold int init(AVFilterContext *ctx) > +{ > + FBDetileContext *fbdetile = ctx->priv; > + > + if (fbdetile->type == TYPE_INTELX) { > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-x to linear\n"); > + } else if (fbdetile->type == TYPE_INTELY) { > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-y to linear\n"); > + } else { > + fprintf(stderr,"DBUG:fbdetile:init: Unknown Tile format specified, > shouldnt reach here\n"); > + } > + fbdetile->width = 1920; > + fbdetile->height = 1080; > + return 0; > +} > + > +static int query_formats(AVFilterContext *ctx) > +{ > + // Currently only RGB based 32bit formats are specified > + // TODO: Technically the logic is transparent to 16bit RGB formats also > + static const enum AVPixelFormat pix_fmts[] = {AV_PIX_FMT_RGB0, > AV_PIX_FMT_0RGB, AV_PIX_FMT_BGR0, AV_PIX_FMT_0BGR, > + AV_PIX_FMT_RGBA, > AV_PIX_FMT_ARGB, AV_PIX_FMT_BGRA, AV_PIX_FMT_ABGR, > + AV_PIX_FMT_NONE}; > + AVFilterFormats *fmts_list; > + > + fmts_list = ff_make_format_list(pix_fmts); > + if (!fmts_list) > + return AVERROR(ENOMEM); > + return ff_set_common_formats(ctx, fmts_list); > +} > + > +static int config_props(AVFilterLink *inlink) > +{ > + AVFilterContext *ctx = inlink->dst; > + FBDetileContext *fbdetile = ctx->priv; > + > + fbdetile->width = inlink->w; > + fbdetile->height = inlink->h; > + fprintf(stderr,"DBUG:fbdetile:config_props: %d x %d\n", > fbdetile->width, fbdetile->height); > + > + return 0; > +} > + > +static void detile_intelx(AVFilterContext *ctx, int w, int h, > + uint8_t *dst, int dstLineSize, > + const uint8_t *src, int srcLineSize) > +{ > + // Offsets and LineSize are in bytes > + int tileW = 128; // For a 32Bit / Pixel framebuffer, 512/4 > + int tileH = 8; > + > + if (w*4 != srcLineSize) { > + fprintf(stderr,"DBUG:fbdetile:intelx: w%dxh%d, dL%d, sL%d\n", w, h, > dstLineSize, srcLineSize); > + fprintf(stderr,"ERRR:fbdetile:intelx: dont support LineSize | Pitch > going beyond width\n"); > + } > + int sO = 0; > + int dX = 0; > + int dY = 0; > + int nTRows = (w*h)/tileW; > + int cTR = 0; > + while (cTR < nTRows) { > + int dO = dY*dstLineSize + dX*4; > +#ifdef DEBUG_FBTILE > + fprintf(stderr,"DBUG:fbdetile:intelx: dX%d dY%d, sO%d, dO%d\n", dX, > dY, sO, dO); > +#endif > + memcpy(dst+dO+0*dstLineSize, src+sO+0*512, 512); > + memcpy(dst+dO+1*dstLineSize, src+sO+1*512, 512); > + memcpy(dst+dO+2*dstLineSize, src+sO+2*512, 512); > + memcpy(dst+dO+3*dstLineSize, src+sO+3*512, 512); > + memcpy(dst+dO+4*dstLineSize, src+sO+4*512, 512); > + memcpy(dst+dO+5*dstLineSize, src+sO+5*512, 512); > + memcpy(dst+dO+6*dstLineSize, src+sO+6*512, 512); > + memcpy(dst+dO+7*dstLineSize, src+sO+7*512, 512); > + dX += tileW; > + if (dX >= w) { > + dX = 0; > + dY += 8; > + } > + sO = sO + 8*512; > + cTR += 8; > + } > +} > + > +/* > + * Intel Legacy Tile-Y layout conversion support > + * > + * currently done in a simple dumb way. Two low hanging optimisations > + * that could be readily applied are > + * > + * a) unrolling the inner for loop > + * --- Given small size memcpy, should help, DONE > + * > + * b) using simd based 128bit loading and storing along with prefetch > + * hinting. > + * > + * TOTHINK|CHECK: Does memcpy already does this and more if situation > + * is right?! > + * > + * As code (or even intrinsics) would be specific to each architecture, > + * avoiding for now. Later have to check if vector_size attribute and > + * corresponding implementation by gcc can handle different > architectures > + * properly, such that it wont become worse than memcpy provided for > that > + * architecture. > + * > + * Or maybe I could even merge the two intel detiling logics into one, as > + * the semantic and flow is almost same for both logics. > + * > + */ > +static void detile_intely(AVFilterContext *ctx, int w, int h, > + uint8_t *dst, int dstLineSize, > + const uint8_t *src, int srcLineSize) > +{ > + // Offsets and LineSize are in bytes > + int tileW = 4; // For a 32Bit / Pixel framebuffer, 16/4 > + int tileH = 32; > + > + if (w*4 != srcLineSize) { > + fprintf(stderr,"DBUG:fbdetile:intely: w%dxh%d, dL%d, sL%d\n", w, h, > dstLineSize, srcLineSize); > + fprintf(stderr,"ERRR:fbdetile:intely: dont support LineSize | Pitch > going beyond width\n"); > + } > + int sO = 0; > + int dX = 0; > + int dY = 0; > + int nTRows = (w*h)/tileW; > + int cTR = 0; > + while (cTR < nTRows) { > + int dO = dY*dstLineSize + dX*4; > +#ifdef DEBUG_FBTILE > + fprintf(stderr,"DBUG:fbdetile:intely: dX%d dY%d, sO%d, dO%d\n", dX, > dY, sO, dO); > +#endif > + > + memcpy(dst+dO+0*dstLineSize, src+sO+0*16, 16); > + memcpy(dst+dO+1*dstLineSize, src+sO+1*16, 16); > + memcpy(dst+dO+2*dstLineSize, src+sO+2*16, 16); > + memcpy(dst+dO+3*dstLineSize, src+sO+3*16, 16); > + memcpy(dst+dO+4*dstLineSize, src+sO+4*16, 16); > + memcpy(dst+dO+5*dstLineSize, src+sO+5*16, 16); > + memcpy(dst+dO+6*dstLineSize, src+sO+6*16, 16); > + memcpy(dst+dO+7*dstLineSize, src+sO+7*16, 16); > + memcpy(dst+dO+8*dstLineSize, src+sO+8*16, 16); > + memcpy(dst+dO+9*dstLineSize, src+sO+9*16, 16); > + memcpy(dst+dO+10*dstLineSize, src+sO+10*16, 16); > + memcpy(dst+dO+11*dstLineSize, src+sO+11*16, 16); > + memcpy(dst+dO+12*dstLineSize, src+sO+12*16, 16); > + memcpy(dst+dO+13*dstLineSize, src+sO+13*16, 16); > + memcpy(dst+dO+14*dstLineSize, src+sO+14*16, 16); > + memcpy(dst+dO+15*dstLineSize, src+sO+15*16, 16); > + memcpy(dst+dO+16*dstLineSize, src+sO+16*16, 16); > + memcpy(dst+dO+17*dstLineSize, src+sO+17*16, 16); > + memcpy(dst+dO+18*dstLineSize, src+sO+18*16, 16); > + memcpy(dst+dO+19*dstLineSize, src+sO+19*16, 16); > + memcpy(dst+dO+20*dstLineSize, src+sO+20*16, 16); > + memcpy(dst+dO+21*dstLineSize, src+sO+21*16, 16); > + memcpy(dst+dO+22*dstLineSize, src+sO+22*16, 16); > + memcpy(dst+dO+23*dstLineSize, src+sO+23*16, 16); > + memcpy(dst+dO+24*dstLineSize, src+sO+24*16, 16); > + memcpy(dst+dO+25*dstLineSize, src+sO+25*16, 16); > + memcpy(dst+dO+26*dstLineSize, src+sO+26*16, 16); > + memcpy(dst+dO+27*dstLineSize, src+sO+27*16, 16); > + memcpy(dst+dO+28*dstLineSize, src+sO+28*16, 16); > + memcpy(dst+dO+29*dstLineSize, src+sO+29*16, 16); > + memcpy(dst+dO+30*dstLineSize, src+sO+30*16, 16); > + memcpy(dst+dO+31*dstLineSize, src+sO+31*16, 16); > + > + dX += tileW; > + if (dX >= w) { > + dX = 0; > + dY += 32; > + } > + sO = sO + 32*16; > + cTR += 32; > + } > +} > + > +static int filter_frame(AVFilterLink *inlink, AVFrame *in) > +{ > + AVFilterContext *ctx = inlink->dst; > + FBDetileContext *fbdetile = ctx->priv; > + AVFilterLink *outlink = ctx->outputs[0]; > + AVFrame *out; > + > + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); > + if (!out) { > + av_frame_free(&in); > + return AVERROR(ENOMEM); > + } > + av_frame_copy_props(out, in); > + > + if (fbdetile->type == TYPE_INTELX) { > + detile_intelx(ctx, fbdetile->width, fbdetile->height, > + out->data[0], out->linesize[0], > + in->data[0], in->linesize[0]); > + } else if (fbdetile->type == TYPE_INTELY) { > + detile_intely(ctx, fbdetile->width, fbdetile->height, > + out->data[0], out->linesize[0], > + in->data[0], in->linesize[0]); > + } > + > + av_frame_free(&in); > + return ff_filter_frame(outlink, out); > +} > + > +static av_cold void uninit(AVFilterContext *ctx) > +{ > + > +} > + > +static const AVFilterPad fbdetile_inputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_VIDEO, > + .config_props = config_props, > + .filter_frame = filter_frame, > + }, > + { NULL } > +}; > + > +static const AVFilterPad fbdetile_outputs[] = { > + { > + .name = "default", > + .type = AVMEDIA_TYPE_VIDEO, > + }, > + { NULL } > +}; > + > +AVFilter ff_vf_fbdetile = { > + .name = "fbdetile", > + .description = NULL_IF_CONFIG_SMALL("Detile Framebuffer using CPU"), > + .priv_size = sizeof(FBDetileContext), > + .init = init, > + .uninit = uninit, > + .query_formats = query_formats, > + .inputs = fbdetile_inputs, > + .outputs = fbdetile_outputs, > + .priv_class = &fbdetile_class, > +}; > -- > 2.20.1 > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".