Hi, It is a new video filter which I created to do detailing of the Intel Tile-X and Tile-Y framebuffer layouts into linear layout using a logic which runs on the cpu. It can be used if one uses kmsgrab and hwdownload to capture screen on a Intel GPU based system, so that one can get proper screen capture.
Without this kmsgrab will generate a unusable/scrambled capture, because the contents will be tiled. I had this issue few days back when trying to capture screen with wayland, so created this. In the patch submitted, I have added the doc/filters.texi, which mentions the same. On Sun, Jun 28, 2020 at 1:30 AM Paul B Mahol <one...@gmail.com> wrote: > What is this? > > Missing documentation. > NAK > > On 6/27/20, hanishkvc <hanish...@gmail.com> wrote: > > v02-20200627IST2331 > > > > Unrolled Intel Legacy Tile-Y detiling logic. > > > > Also a consolidated patch file, instead of the previous development > > flow based multiple patch files. > > > > v01-20200627IST1308 > > > > Implemented Intel Legacy Tile-X and Tile-Y detiling logic > > > > NOTES: > > > > This video filter allows framebuffers which are tiled to be detiled > > using logic running on the cpu, into a linear layout. > > > > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling. > > THis should help one to work with frames captured (say using kmsgrab) > > on laptops having Intel GPU. > > > > Tile-X conversion logic has been explicitly cross checked, with Tile-X > > based frames. However Tile-Y conv logic hasnt been tested with Tile-Y > > based frames, but it should potentially do the job, based on my current > > understanding of the Tile-Y layout format. > > > > TODO1: At a later time have to generate Tile-Y based frames, and then > > cross check the corresponding logic explicitly. > > > > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the > > layout conversion. But some online discussions from sometime back seem > > to indicate that this path is not fully bug free currently. > > --- > > Changelog | 1 + > > doc/filters.texi | 62 ++++++++ > > libavfilter/Makefile | 1 + > > libavfilter/allfilters.c | 1 + > > libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 374 insertions(+) > > create mode 100644 libavfilter/vf_fbdetile.c > > > > diff --git a/Changelog b/Changelog > > index a60e7d2eb8..0e03491f6a 100644 > > --- a/Changelog > > +++ b/Changelog > > @@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to > youngest > > within each release, > > releases are sorted from youngest to oldest. > > > > version <next>: > > +- fbdetile cpu based framebuffer layout detiling video filter > > - AudioToolbox output device > > - MacCaption demuxer > > > > diff --git a/doc/filters.texi b/doc/filters.texi > > index 3c2dd2eb90..73ba21af89 100644 > > --- a/doc/filters.texi > > +++ b/doc/filters.texi > > @@ -12210,6 +12210,68 @@ It accepts the following optional parameters: > > The number of the CUDA device to use > > @end table > > > > +@anchor{fbdetile} > > +@section fbdetile > > + > > +Detiles the Framebuffer tile layout into a linear layout using CPU. > > + > > +It currently supports conversion from Intel legacy tile-x and tile-y > > layouts > > +into a linear layout. This is useful if one is using kmsgrab and > hwdownload > > +to capture a screen which is using one of these non-linear layouts. > > + > > +Currently it expects the data to be a 32bit RGB based pixel format. > However > > +the logic doesnt do any pixel format conversion or so. Later will be > > enabling > > +16bit RGB data also, as the logic is transparent to it at one level. > > + > > +One could either insert this into the filter chain while capturing > itself, > > +or else, if it is slowing things down or so, then one could instead > insert > > +it into the filter chain during playback or transcoding or so. > > + > > +It supports the following optional parameters > > + > > +@table @option > > +@item type > > +Specify which detiling conversion to apply. The supported values are > > +@table @var > > +@item 0 > > +intel tile-x to linear conversion (the default) > > +@item 1 > > +intel tile-y to linear conversion. > > +@end table > > +@end table > > + > > +If one wants to convert during capture itself, one could do > > +@example > > +ffmpeg -f kmsgrab -i - -vf "hwdownload, fbdetile" OUTPUT > > +@end example > > + > > +However if one wants to convert after the tiled data has been already > > captured > > +@example > > +ffmpeg -i INPUT -vf "fbdetile" OUTPUT > > +@end example > > +@example > > +ffplay -i INPUT -vf "fbdetile" > > +@end example > > + > > +NOTE: While transcoding a test 1080p h264 stream, with 276 frames, with > two > > +runs of each situation, the performance was has given below. However > this > > +was for the older | initial version of the logic, as well as it was run > on > > +the default linux chromebook->vm->container, so the perf values need > not be > > +proper. But in a relative sense the overhead would be similar. > > +@example > > +rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4 > > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=0 out.mp4 > > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=1 out.mp4 > > +@end example > > +@table @option > > +@item with no fbdetile filter > > +it took ~7.28 secs, > > +@item with fbdetile=0 filter > > +it took ~8.69 secs, > > +@item with fbdetile=1 filter > > +it took ~9.20 secs. > > +@end table > > + > > @section hqx > > > > Apply a high-quality magnification filter designed for pixel art. This > > filter > > diff --git a/libavfilter/Makefile b/libavfilter/Makefile > > index 5123540653..bdb0c379ae 100644 > > --- a/libavfilter/Makefile > > +++ b/libavfilter/Makefile > > @@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER) += > > vf_hwdownload.o > > OBJS-$(CONFIG_HWMAP_FILTER) += vf_hwmap.o > > OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER) += vf_hwupload_cuda.o > > OBJS-$(CONFIG_HWUPLOAD_FILTER) += vf_hwupload.o > > +OBJS-$(CONFIG_FBDETILE_FILTER) += vf_fbdetile.o > > OBJS-$(CONFIG_HYSTERESIS_FILTER) += vf_hysteresis.o > framesync.o > > OBJS-$(CONFIG_IDET_FILTER) += vf_idet.o > > OBJS-$(CONFIG_IL_FILTER) += vf_il.o > > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c > > index 1183e40267..f8dceb2a88 100644 > > --- a/libavfilter/allfilters.c > > +++ b/libavfilter/allfilters.c > > @@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload; > > extern AVFilter ff_vf_hwmap; > > extern AVFilter ff_vf_hwupload; > > extern AVFilter ff_vf_hwupload_cuda; > > +extern AVFilter ff_vf_fbdetile; > > extern AVFilter ff_vf_hysteresis; > > extern AVFilter ff_vf_idet; > > extern AVFilter ff_vf_il; > > diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c > > new file mode 100644 > > index 0000000000..8b20c96d2c > > --- /dev/null > > +++ b/libavfilter/vf_fbdetile.c > > @@ -0,0 +1,309 @@ > > +/* > > + * Copyright (c) 2020 HanishKVC > > + * > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with FFmpeg; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA > 02110-1301 > > USA > > + */ > > + > > +/** > > + * @file > > + * Detile the Frame buffer's tile layout using the cpu > > + * Currently it supports the legacy Intel Tile X layout detiling. > > + * > > + */ > > + > > +/* > > + * ToThink|Check: Optimisations > > + * > > + * Does gcc setting used by ffmpeg allows memcpy | stringops inlining, > > + * loop unrolling, better native matching instructions, additional > > + * optimisations, ... > > + * > > + * Does gcc map to optimal memcpy logic, based on the situation it is > > + * used in. > > + * > > + * If not, may be look at vector_size or intrinsics or appropriate arch > > + * and cpu specific inline asm or ... > > + * > > + */ > > + > > +#include "libavutil/avassert.h" > > +#include "libavutil/imgutils.h" > > +#include "libavutil/opt.h" > > +#include "avfilter.h" > > +#include "formats.h" > > +#include "internal.h" > > +#include "video.h" > > + > > +enum FilterMode { > > + TYPE_INTELX, > > + TYPE_INTELY, > > + NB_TYPE > > +}; > > + > > +typedef struct FBDetileContext { > > + const AVClass *class; > > + int width, height; > > + int type; > > +} FBDetileContext; > > + > > +#define OFFSET(x) offsetof(FBDetileContext, x) > > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM > > +static const AVOption fbdetile_options[] = { > > + { "type", "set framebuffer format_modifier type", OFFSET(type), > > AV_OPT_TYPE_INT, {.i64=TYPE_INTELX}, 0, NB_TYPE-1, FLAGS, "type" }, > > + { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST, > > {.i64=TYPE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" }, > > + { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST, > > {.i64=TYPE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" }, > > + { NULL } > > +}; > > + > > +AVFILTER_DEFINE_CLASS(fbdetile); > > + > > +static av_cold int init(AVFilterContext *ctx) > > +{ > > + FBDetileContext *fbdetile = ctx->priv; > > + > > + if (fbdetile->type == TYPE_INTELX) { > > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-x to linear\n"); > > + } else if (fbdetile->type == TYPE_INTELY) { > > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-y to linear\n"); > > + } else { > > + fprintf(stderr,"DBUG:fbdetile:init: Unknown Tile format > specified, > > shouldnt reach here\n"); > > + } > > + fbdetile->width = 1920; > > + fbdetile->height = 1080; > > + return 0; > > +} > > + > > +static int query_formats(AVFilterContext *ctx) > > +{ > > + // Currently only RGB based 32bit formats are specified > > + // TODO: Technically the logic is transparent to 16bit RGB formats > also > > + static const enum AVPixelFormat pix_fmts[] = {AV_PIX_FMT_RGB0, > > AV_PIX_FMT_0RGB, AV_PIX_FMT_BGR0, AV_PIX_FMT_0BGR, > > + AV_PIX_FMT_RGBA, > > AV_PIX_FMT_ARGB, AV_PIX_FMT_BGRA, AV_PIX_FMT_ABGR, > > + AV_PIX_FMT_NONE}; > > + AVFilterFormats *fmts_list; > > + > > + fmts_list = ff_make_format_list(pix_fmts); > > + if (!fmts_list) > > + return AVERROR(ENOMEM); > > + return ff_set_common_formats(ctx, fmts_list); > > +} > > + > > +static int config_props(AVFilterLink *inlink) > > +{ > > + AVFilterContext *ctx = inlink->dst; > > + FBDetileContext *fbdetile = ctx->priv; > > + > > + fbdetile->width = inlink->w; > > + fbdetile->height = inlink->h; > > + fprintf(stderr,"DBUG:fbdetile:config_props: %d x %d\n", > > fbdetile->width, fbdetile->height); > > + > > + return 0; > > +} > > + > > +static void detile_intelx(AVFilterContext *ctx, int w, int h, > > + uint8_t *dst, int dstLineSize, > > + const uint8_t *src, int srcLineSize) > > +{ > > + // Offsets and LineSize are in bytes > > + int tileW = 128; // For a 32Bit / Pixel framebuffer, 512/4 > > + int tileH = 8; > > + > > + if (w*4 != srcLineSize) { > > + fprintf(stderr,"DBUG:fbdetile:intelx: w%dxh%d, dL%d, sL%d\n", > w, h, > > dstLineSize, srcLineSize); > > + fprintf(stderr,"ERRR:fbdetile:intelx: dont support LineSize | > Pitch > > going beyond width\n"); > > + } > > + int sO = 0; > > + int dX = 0; > > + int dY = 0; > > + int nTRows = (w*h)/tileW; > > + int cTR = 0; > > + while (cTR < nTRows) { > > + int dO = dY*dstLineSize + dX*4; > > +#ifdef DEBUG_FBTILE > > + fprintf(stderr,"DBUG:fbdetile:intelx: dX%d dY%d, sO%d, dO%d\n", > dX, > > dY, sO, dO); > > +#endif > > + memcpy(dst+dO+0*dstLineSize, src+sO+0*512, 512); > > + memcpy(dst+dO+1*dstLineSize, src+sO+1*512, 512); > > + memcpy(dst+dO+2*dstLineSize, src+sO+2*512, 512); > > + memcpy(dst+dO+3*dstLineSize, src+sO+3*512, 512); > > + memcpy(dst+dO+4*dstLineSize, src+sO+4*512, 512); > > + memcpy(dst+dO+5*dstLineSize, src+sO+5*512, 512); > > + memcpy(dst+dO+6*dstLineSize, src+sO+6*512, 512); > > + memcpy(dst+dO+7*dstLineSize, src+sO+7*512, 512); > > + dX += tileW; > > + if (dX >= w) { > > + dX = 0; > > + dY += 8; > > + } > > + sO = sO + 8*512; > > + cTR += 8; > > + } > > +} > > + > > +/* > > + * Intel Legacy Tile-Y layout conversion support > > + * > > + * currently done in a simple dumb way. Two low hanging optimisations > > + * that could be readily applied are > > + * > > + * a) unrolling the inner for loop > > + * --- Given small size memcpy, should help, DONE > > + * > > + * b) using simd based 128bit loading and storing along with prefetch > > + * hinting. > > + * > > + * TOTHINK|CHECK: Does memcpy already does this and more if situation > > + * is right?! > > + * > > + * As code (or even intrinsics) would be specific to each > architecture, > > + * avoiding for now. Later have to check if vector_size attribute and > > + * corresponding implementation by gcc can handle different > > architectures > > + * properly, such that it wont become worse than memcpy provided for > > that > > + * architecture. > > + * > > + * Or maybe I could even merge the two intel detiling logics into one, > as > > + * the semantic and flow is almost same for both logics. > > + * > > + */ > > +static void detile_intely(AVFilterContext *ctx, int w, int h, > > + uint8_t *dst, int dstLineSize, > > + const uint8_t *src, int srcLineSize) > > +{ > > + // Offsets and LineSize are in bytes > > + int tileW = 4; // For a 32Bit / Pixel framebuffer, 16/4 > > + int tileH = 32; > > + > > + if (w*4 != srcLineSize) { > > + fprintf(stderr,"DBUG:fbdetile:intely: w%dxh%d, dL%d, sL%d\n", > w, h, > > dstLineSize, srcLineSize); > > + fprintf(stderr,"ERRR:fbdetile:intely: dont support LineSize | > Pitch > > going beyond width\n"); > > + } > > + int sO = 0; > > + int dX = 0; > > + int dY = 0; > > + int nTRows = (w*h)/tileW; > > + int cTR = 0; > > + while (cTR < nTRows) { > > + int dO = dY*dstLineSize + dX*4; > > +#ifdef DEBUG_FBTILE > > + fprintf(stderr,"DBUG:fbdetile:intely: dX%d dY%d, sO%d, dO%d\n", > dX, > > dY, sO, dO); > > +#endif > > + > > + memcpy(dst+dO+0*dstLineSize, src+sO+0*16, 16); > > + memcpy(dst+dO+1*dstLineSize, src+sO+1*16, 16); > > + memcpy(dst+dO+2*dstLineSize, src+sO+2*16, 16); > > + memcpy(dst+dO+3*dstLineSize, src+sO+3*16, 16); > > + memcpy(dst+dO+4*dstLineSize, src+sO+4*16, 16); > > + memcpy(dst+dO+5*dstLineSize, src+sO+5*16, 16); > > + memcpy(dst+dO+6*dstLineSize, src+sO+6*16, 16); > > + memcpy(dst+dO+7*dstLineSize, src+sO+7*16, 16); > > + memcpy(dst+dO+8*dstLineSize, src+sO+8*16, 16); > > + memcpy(dst+dO+9*dstLineSize, src+sO+9*16, 16); > > + memcpy(dst+dO+10*dstLineSize, src+sO+10*16, 16); > > + memcpy(dst+dO+11*dstLineSize, src+sO+11*16, 16); > > + memcpy(dst+dO+12*dstLineSize, src+sO+12*16, 16); > > + memcpy(dst+dO+13*dstLineSize, src+sO+13*16, 16); > > + memcpy(dst+dO+14*dstLineSize, src+sO+14*16, 16); > > + memcpy(dst+dO+15*dstLineSize, src+sO+15*16, 16); > > + memcpy(dst+dO+16*dstLineSize, src+sO+16*16, 16); > > + memcpy(dst+dO+17*dstLineSize, src+sO+17*16, 16); > > + memcpy(dst+dO+18*dstLineSize, src+sO+18*16, 16); > > + memcpy(dst+dO+19*dstLineSize, src+sO+19*16, 16); > > + memcpy(dst+dO+20*dstLineSize, src+sO+20*16, 16); > > + memcpy(dst+dO+21*dstLineSize, src+sO+21*16, 16); > > + memcpy(dst+dO+22*dstLineSize, src+sO+22*16, 16); > > + memcpy(dst+dO+23*dstLineSize, src+sO+23*16, 16); > > + memcpy(dst+dO+24*dstLineSize, src+sO+24*16, 16); > > + memcpy(dst+dO+25*dstLineSize, src+sO+25*16, 16); > > + memcpy(dst+dO+26*dstLineSize, src+sO+26*16, 16); > > + memcpy(dst+dO+27*dstLineSize, src+sO+27*16, 16); > > + memcpy(dst+dO+28*dstLineSize, src+sO+28*16, 16); > > + memcpy(dst+dO+29*dstLineSize, src+sO+29*16, 16); > > + memcpy(dst+dO+30*dstLineSize, src+sO+30*16, 16); > > + memcpy(dst+dO+31*dstLineSize, src+sO+31*16, 16); > > + > > + dX += tileW; > > + if (dX >= w) { > > + dX = 0; > > + dY += 32; > > + } > > + sO = sO + 32*16; > > + cTR += 32; > > + } > > +} > > + > > +static int filter_frame(AVFilterLink *inlink, AVFrame *in) > > +{ > > + AVFilterContext *ctx = inlink->dst; > > + FBDetileContext *fbdetile = ctx->priv; > > + AVFilterLink *outlink = ctx->outputs[0]; > > + AVFrame *out; > > + > > + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); > > + if (!out) { > > + av_frame_free(&in); > > + return AVERROR(ENOMEM); > > + } > > + av_frame_copy_props(out, in); > > + > > + if (fbdetile->type == TYPE_INTELX) { > > + detile_intelx(ctx, fbdetile->width, fbdetile->height, > > + out->data[0], out->linesize[0], > > + in->data[0], in->linesize[0]); > > + } else if (fbdetile->type == TYPE_INTELY) { > > + detile_intely(ctx, fbdetile->width, fbdetile->height, > > + out->data[0], out->linesize[0], > > + in->data[0], in->linesize[0]); > > + } > > + > > + av_frame_free(&in); > > + return ff_filter_frame(outlink, out); > > +} > > + > > +static av_cold void uninit(AVFilterContext *ctx) > > +{ > > + > > +} > > + > > +static const AVFilterPad fbdetile_inputs[] = { > > + { > > + .name = "default", > > + .type = AVMEDIA_TYPE_VIDEO, > > + .config_props = config_props, > > + .filter_frame = filter_frame, > > + }, > > + { NULL } > > +}; > > + > > +static const AVFilterPad fbdetile_outputs[] = { > > + { > > + .name = "default", > > + .type = AVMEDIA_TYPE_VIDEO, > > + }, > > + { NULL } > > +}; > > + > > +AVFilter ff_vf_fbdetile = { > > + .name = "fbdetile", > > + .description = NULL_IF_CONFIG_SMALL("Detile Framebuffer using > CPU"), > > + .priv_size = sizeof(FBDetileContext), > > + .init = init, > > + .uninit = uninit, > > + .query_formats = query_formats, > > + .inputs = fbdetile_inputs, > > + .outputs = fbdetile_outputs, > > + .priv_class = &fbdetile_class, > > +}; > > -- > > 2.20.1 > > > > _______________________________________________ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > -- Keep ;-) HanishKVC _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".