On 6/27/20, C Hanish Menon <hanish...@gmail.com> wrote: > Hi, > > It is a new video filter which I created to do detailing of the Intel > Tile-X and Tile-Y framebuffer layouts into linear layout using a logic > which runs on the cpu. It can be used if one uses kmsgrab and hwdownload to > capture screen on a Intel GPU based system, so that one can get proper > screen capture. > > Without this kmsgrab will generate a unusable/scrambled capture, because > the contents will be tiled. I had this issue few days back when trying to > capture screen with wayland, so created this. > > In the patch submitted, I have added the doc/filters.texi, which mentions > the same.
Filter is marginally useful, it is done in CPU, completely invalidating any possible gain using hw path. > > > > On Sun, Jun 28, 2020 at 1:30 AM Paul B Mahol <one...@gmail.com> wrote: > >> What is this? >> >> Missing documentation. >> NAK >> >> On 6/27/20, hanishkvc <hanish...@gmail.com> wrote: >> > v02-20200627IST2331 >> > >> > Unrolled Intel Legacy Tile-Y detiling logic. >> > >> > Also a consolidated patch file, instead of the previous development >> > flow based multiple patch files. >> > >> > v01-20200627IST1308 >> > >> > Implemented Intel Legacy Tile-X and Tile-Y detiling logic >> > >> > NOTES: >> > >> > This video filter allows framebuffers which are tiled to be detiled >> > using logic running on the cpu, into a linear layout. >> > >> > Currently it supports Intel Legacy Tile-X and Tile-Y layout detiling. >> > THis should help one to work with frames captured (say using kmsgrab) >> > on laptops having Intel GPU. >> > >> > Tile-X conversion logic has been explicitly cross checked, with Tile-X >> > based frames. However Tile-Y conv logic hasnt been tested with Tile-Y >> > based frames, but it should potentially do the job, based on my current >> > understanding of the Tile-Y layout format. >> > >> > TODO1: At a later time have to generate Tile-Y based frames, and then >> > cross check the corresponding logic explicitly. >> > >> > TODO2: May be use OpenGL or Vulcan buffer helper routines to do the >> > layout conversion. But some online discussions from sometime back seem >> > to indicate that this path is not fully bug free currently. >> > --- >> > Changelog | 1 + >> > doc/filters.texi | 62 ++++++++ >> > libavfilter/Makefile | 1 + >> > libavfilter/allfilters.c | 1 + >> > libavfilter/vf_fbdetile.c | 309 ++++++++++++++++++++++++++++++++++++++ >> > 5 files changed, 374 insertions(+) >> > create mode 100644 libavfilter/vf_fbdetile.c >> > >> > diff --git a/Changelog b/Changelog >> > index a60e7d2eb8..0e03491f6a 100644 >> > --- a/Changelog >> > +++ b/Changelog >> > @@ -2,6 +2,7 @@ Entries are sorted chronologically from oldest to >> youngest >> > within each release, >> > releases are sorted from youngest to oldest. >> > >> > version <next>: >> > +- fbdetile cpu based framebuffer layout detiling video filter >> > - AudioToolbox output device >> > - MacCaption demuxer >> > >> > diff --git a/doc/filters.texi b/doc/filters.texi >> > index 3c2dd2eb90..73ba21af89 100644 >> > --- a/doc/filters.texi >> > +++ b/doc/filters.texi >> > @@ -12210,6 +12210,68 @@ It accepts the following optional parameters: >> > The number of the CUDA device to use >> > @end table >> > >> > +@anchor{fbdetile} >> > +@section fbdetile >> > + >> > +Detiles the Framebuffer tile layout into a linear layout using CPU. >> > + >> > +It currently supports conversion from Intel legacy tile-x and tile-y >> > layouts >> > +into a linear layout. This is useful if one is using kmsgrab and >> hwdownload >> > +to capture a screen which is using one of these non-linear layouts. >> > + >> > +Currently it expects the data to be a 32bit RGB based pixel format. >> However >> > +the logic doesnt do any pixel format conversion or so. Later will be >> > enabling >> > +16bit RGB data also, as the logic is transparent to it at one level. >> > + >> > +One could either insert this into the filter chain while capturing >> itself, >> > +or else, if it is slowing things down or so, then one could instead >> insert >> > +it into the filter chain during playback or transcoding or so. >> > + >> > +It supports the following optional parameters >> > + >> > +@table @option >> > +@item type >> > +Specify which detiling conversion to apply. The supported values are >> > +@table @var >> > +@item 0 >> > +intel tile-x to linear conversion (the default) >> > +@item 1 >> > +intel tile-y to linear conversion. >> > +@end table >> > +@end table >> > + >> > +If one wants to convert during capture itself, one could do >> > +@example >> > +ffmpeg -f kmsgrab -i - -vf "hwdownload, fbdetile" OUTPUT >> > +@end example >> > + >> > +However if one wants to convert after the tiled data has been already >> > captured >> > +@example >> > +ffmpeg -i INPUT -vf "fbdetile" OUTPUT >> > +@end example >> > +@example >> > +ffplay -i INPUT -vf "fbdetile" >> > +@end example >> > + >> > +NOTE: While transcoding a test 1080p h264 stream, with 276 frames, >> > with >> two >> > +runs of each situation, the performance was has given below. However >> this >> > +was for the older | initial version of the logic, as well as it was >> > run >> on >> > +the default linux chromebook->vm->container, so the perf values need >> not be >> > +proper. But in a relative sense the overhead would be similar. >> > +@example >> > +rm out.mp4; time ./ffmpeg -i input.mp4 out.mp4 >> > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=0 out.mp4 >> > +rm out.mp4; time ./ffmpeg -i input.mp4 -vf fbdetile=1 out.mp4 >> > +@end example >> > +@table @option >> > +@item with no fbdetile filter >> > +it took ~7.28 secs, >> > +@item with fbdetile=0 filter >> > +it took ~8.69 secs, >> > +@item with fbdetile=1 filter >> > +it took ~9.20 secs. >> > +@end table >> > + >> > @section hqx >> > >> > Apply a high-quality magnification filter designed for pixel art. This >> > filter >> > diff --git a/libavfilter/Makefile b/libavfilter/Makefile >> > index 5123540653..bdb0c379ae 100644 >> > --- a/libavfilter/Makefile >> > +++ b/libavfilter/Makefile >> > @@ -280,6 +280,7 @@ OBJS-$(CONFIG_HWDOWNLOAD_FILTER) += >> > vf_hwdownload.o >> > OBJS-$(CONFIG_HWMAP_FILTER) += vf_hwmap.o >> > OBJS-$(CONFIG_HWUPLOAD_CUDA_FILTER) += vf_hwupload_cuda.o >> > OBJS-$(CONFIG_HWUPLOAD_FILTER) += vf_hwupload.o >> > +OBJS-$(CONFIG_FBDETILE_FILTER) += vf_fbdetile.o >> > OBJS-$(CONFIG_HYSTERESIS_FILTER) += vf_hysteresis.o >> framesync.o >> > OBJS-$(CONFIG_IDET_FILTER) += vf_idet.o >> > OBJS-$(CONFIG_IL_FILTER) += vf_il.o >> > diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c >> > index 1183e40267..f8dceb2a88 100644 >> > --- a/libavfilter/allfilters.c >> > +++ b/libavfilter/allfilters.c >> > @@ -265,6 +265,7 @@ extern AVFilter ff_vf_hwdownload; >> > extern AVFilter ff_vf_hwmap; >> > extern AVFilter ff_vf_hwupload; >> > extern AVFilter ff_vf_hwupload_cuda; >> > +extern AVFilter ff_vf_fbdetile; >> > extern AVFilter ff_vf_hysteresis; >> > extern AVFilter ff_vf_idet; >> > extern AVFilter ff_vf_il; >> > diff --git a/libavfilter/vf_fbdetile.c b/libavfilter/vf_fbdetile.c >> > new file mode 100644 >> > index 0000000000..8b20c96d2c >> > --- /dev/null >> > +++ b/libavfilter/vf_fbdetile.c >> > @@ -0,0 +1,309 @@ >> > +/* >> > + * Copyright (c) 2020 HanishKVC >> > + * >> > + * This file is part of FFmpeg. >> > + * >> > + * FFmpeg is free software; you can redistribute it and/or >> > + * modify it under the terms of the GNU Lesser General Public >> > + * License as published by the Free Software Foundation; either >> > + * version 2.1 of the License, or (at your option) any later version. >> > + * >> > + * FFmpeg is distributed in the hope that it will be useful, >> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >> > + * Lesser General Public License for more details. >> > + * >> > + * You should have received a copy of the GNU Lesser General Public >> > + * License along with FFmpeg; if not, write to the Free Software >> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA >> 02110-1301 >> > USA >> > + */ >> > + >> > +/** >> > + * @file >> > + * Detile the Frame buffer's tile layout using the cpu >> > + * Currently it supports the legacy Intel Tile X layout detiling. >> > + * >> > + */ >> > + >> > +/* >> > + * ToThink|Check: Optimisations >> > + * >> > + * Does gcc setting used by ffmpeg allows memcpy | stringops inlining, >> > + * loop unrolling, better native matching instructions, additional >> > + * optimisations, ... >> > + * >> > + * Does gcc map to optimal memcpy logic, based on the situation it is >> > + * used in. >> > + * >> > + * If not, may be look at vector_size or intrinsics or appropriate >> > arch >> > + * and cpu specific inline asm or ... >> > + * >> > + */ >> > + >> > +#include "libavutil/avassert.h" >> > +#include "libavutil/imgutils.h" >> > +#include "libavutil/opt.h" >> > +#include "avfilter.h" >> > +#include "formats.h" >> > +#include "internal.h" >> > +#include "video.h" >> > + >> > +enum FilterMode { >> > + TYPE_INTELX, >> > + TYPE_INTELY, >> > + NB_TYPE >> > +}; >> > + >> > +typedef struct FBDetileContext { >> > + const AVClass *class; >> > + int width, height; >> > + int type; >> > +} FBDetileContext; >> > + >> > +#define OFFSET(x) offsetof(FBDetileContext, x) >> > +#define FLAGS AV_OPT_FLAG_FILTERING_PARAM|AV_OPT_FLAG_VIDEO_PARAM >> > +static const AVOption fbdetile_options[] = { >> > + { "type", "set framebuffer format_modifier type", OFFSET(type), >> > AV_OPT_TYPE_INT, {.i64=TYPE_INTELX}, 0, NB_TYPE-1, FLAGS, "type" }, >> > + { "intelx", "Intel Tile-X layout", 0, AV_OPT_TYPE_CONST, >> > {.i64=TYPE_INTELX}, INT_MIN, INT_MAX, FLAGS, "type" }, >> > + { "intely", "Intel Tile-Y layout", 0, AV_OPT_TYPE_CONST, >> > {.i64=TYPE_INTELY}, INT_MIN, INT_MAX, FLAGS, "type" }, >> > + { NULL } >> > +}; >> > + >> > +AVFILTER_DEFINE_CLASS(fbdetile); >> > + >> > +static av_cold int init(AVFilterContext *ctx) >> > +{ >> > + FBDetileContext *fbdetile = ctx->priv; >> > + >> > + if (fbdetile->type == TYPE_INTELX) { >> > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-x to >> > linear\n"); >> > + } else if (fbdetile->type == TYPE_INTELY) { >> > + fprintf(stderr,"INFO:fbdetile:init: Intel tile-y to >> > linear\n"); >> > + } else { >> > + fprintf(stderr,"DBUG:fbdetile:init: Unknown Tile format >> specified, >> > shouldnt reach here\n"); >> > + } >> > + fbdetile->width = 1920; >> > + fbdetile->height = 1080; >> > + return 0; >> > +} >> > + >> > +static int query_formats(AVFilterContext *ctx) >> > +{ >> > + // Currently only RGB based 32bit formats are specified >> > + // TODO: Technically the logic is transparent to 16bit RGB formats >> also >> > + static const enum AVPixelFormat pix_fmts[] = {AV_PIX_FMT_RGB0, >> > AV_PIX_FMT_0RGB, AV_PIX_FMT_BGR0, AV_PIX_FMT_0BGR, >> > + AV_PIX_FMT_RGBA, >> > AV_PIX_FMT_ARGB, AV_PIX_FMT_BGRA, AV_PIX_FMT_ABGR, >> > + AV_PIX_FMT_NONE}; >> > + AVFilterFormats *fmts_list; >> > + >> > + fmts_list = ff_make_format_list(pix_fmts); >> > + if (!fmts_list) >> > + return AVERROR(ENOMEM); >> > + return ff_set_common_formats(ctx, fmts_list); >> > +} >> > + >> > +static int config_props(AVFilterLink *inlink) >> > +{ >> > + AVFilterContext *ctx = inlink->dst; >> > + FBDetileContext *fbdetile = ctx->priv; >> > + >> > + fbdetile->width = inlink->w; >> > + fbdetile->height = inlink->h; >> > + fprintf(stderr,"DBUG:fbdetile:config_props: %d x %d\n", >> > fbdetile->width, fbdetile->height); >> > + >> > + return 0; >> > +} >> > + >> > +static void detile_intelx(AVFilterContext *ctx, int w, int h, >> > + uint8_t *dst, int dstLineSize, >> > + const uint8_t *src, int srcLineSize) >> > +{ >> > + // Offsets and LineSize are in bytes >> > + int tileW = 128; // For a 32Bit / Pixel framebuffer, 512/4 >> > + int tileH = 8; >> > + >> > + if (w*4 != srcLineSize) { >> > + fprintf(stderr,"DBUG:fbdetile:intelx: w%dxh%d, dL%d, sL%d\n", >> w, h, >> > dstLineSize, srcLineSize); >> > + fprintf(stderr,"ERRR:fbdetile:intelx: dont support LineSize | >> Pitch >> > going beyond width\n"); >> > + } >> > + int sO = 0; >> > + int dX = 0; >> > + int dY = 0; >> > + int nTRows = (w*h)/tileW; >> > + int cTR = 0; >> > + while (cTR < nTRows) { >> > + int dO = dY*dstLineSize + dX*4; >> > +#ifdef DEBUG_FBTILE >> > + fprintf(stderr,"DBUG:fbdetile:intelx: dX%d dY%d, sO%d, >> > dO%d\n", >> dX, >> > dY, sO, dO); >> > +#endif >> > + memcpy(dst+dO+0*dstLineSize, src+sO+0*512, 512); >> > + memcpy(dst+dO+1*dstLineSize, src+sO+1*512, 512); >> > + memcpy(dst+dO+2*dstLineSize, src+sO+2*512, 512); >> > + memcpy(dst+dO+3*dstLineSize, src+sO+3*512, 512); >> > + memcpy(dst+dO+4*dstLineSize, src+sO+4*512, 512); >> > + memcpy(dst+dO+5*dstLineSize, src+sO+5*512, 512); >> > + memcpy(dst+dO+6*dstLineSize, src+sO+6*512, 512); >> > + memcpy(dst+dO+7*dstLineSize, src+sO+7*512, 512); >> > + dX += tileW; >> > + if (dX >= w) { >> > + dX = 0; >> > + dY += 8; >> > + } >> > + sO = sO + 8*512; >> > + cTR += 8; >> > + } >> > +} >> > + >> > +/* >> > + * Intel Legacy Tile-Y layout conversion support >> > + * >> > + * currently done in a simple dumb way. Two low hanging optimisations >> > + * that could be readily applied are >> > + * >> > + * a) unrolling the inner for loop >> > + * --- Given small size memcpy, should help, DONE >> > + * >> > + * b) using simd based 128bit loading and storing along with prefetch >> > + * hinting. >> > + * >> > + * TOTHINK|CHECK: Does memcpy already does this and more if >> > situation >> > + * is right?! >> > + * >> > + * As code (or even intrinsics) would be specific to each >> architecture, >> > + * avoiding for now. Later have to check if vector_size attribute >> > and >> > + * corresponding implementation by gcc can handle different >> > architectures >> > + * properly, such that it wont become worse than memcpy provided >> > for >> > that >> > + * architecture. >> > + * >> > + * Or maybe I could even merge the two intel detiling logics into one, >> as >> > + * the semantic and flow is almost same for both logics. >> > + * >> > + */ >> > +static void detile_intely(AVFilterContext *ctx, int w, int h, >> > + uint8_t *dst, int dstLineSize, >> > + const uint8_t *src, int srcLineSize) >> > +{ >> > + // Offsets and LineSize are in bytes >> > + int tileW = 4; // For a 32Bit / Pixel framebuffer, 16/4 >> > + int tileH = 32; >> > + >> > + if (w*4 != srcLineSize) { >> > + fprintf(stderr,"DBUG:fbdetile:intely: w%dxh%d, dL%d, sL%d\n", >> w, h, >> > dstLineSize, srcLineSize); >> > + fprintf(stderr,"ERRR:fbdetile:intely: dont support LineSize | >> Pitch >> > going beyond width\n"); >> > + } >> > + int sO = 0; >> > + int dX = 0; >> > + int dY = 0; >> > + int nTRows = (w*h)/tileW; >> > + int cTR = 0; >> > + while (cTR < nTRows) { >> > + int dO = dY*dstLineSize + dX*4; >> > +#ifdef DEBUG_FBTILE >> > + fprintf(stderr,"DBUG:fbdetile:intely: dX%d dY%d, sO%d, >> > dO%d\n", >> dX, >> > dY, sO, dO); >> > +#endif >> > + >> > + memcpy(dst+dO+0*dstLineSize, src+sO+0*16, 16); >> > + memcpy(dst+dO+1*dstLineSize, src+sO+1*16, 16); >> > + memcpy(dst+dO+2*dstLineSize, src+sO+2*16, 16); >> > + memcpy(dst+dO+3*dstLineSize, src+sO+3*16, 16); >> > + memcpy(dst+dO+4*dstLineSize, src+sO+4*16, 16); >> > + memcpy(dst+dO+5*dstLineSize, src+sO+5*16, 16); >> > + memcpy(dst+dO+6*dstLineSize, src+sO+6*16, 16); >> > + memcpy(dst+dO+7*dstLineSize, src+sO+7*16, 16); >> > + memcpy(dst+dO+8*dstLineSize, src+sO+8*16, 16); >> > + memcpy(dst+dO+9*dstLineSize, src+sO+9*16, 16); >> > + memcpy(dst+dO+10*dstLineSize, src+sO+10*16, 16); >> > + memcpy(dst+dO+11*dstLineSize, src+sO+11*16, 16); >> > + memcpy(dst+dO+12*dstLineSize, src+sO+12*16, 16); >> > + memcpy(dst+dO+13*dstLineSize, src+sO+13*16, 16); >> > + memcpy(dst+dO+14*dstLineSize, src+sO+14*16, 16); >> > + memcpy(dst+dO+15*dstLineSize, src+sO+15*16, 16); >> > + memcpy(dst+dO+16*dstLineSize, src+sO+16*16, 16); >> > + memcpy(dst+dO+17*dstLineSize, src+sO+17*16, 16); >> > + memcpy(dst+dO+18*dstLineSize, src+sO+18*16, 16); >> > + memcpy(dst+dO+19*dstLineSize, src+sO+19*16, 16); >> > + memcpy(dst+dO+20*dstLineSize, src+sO+20*16, 16); >> > + memcpy(dst+dO+21*dstLineSize, src+sO+21*16, 16); >> > + memcpy(dst+dO+22*dstLineSize, src+sO+22*16, 16); >> > + memcpy(dst+dO+23*dstLineSize, src+sO+23*16, 16); >> > + memcpy(dst+dO+24*dstLineSize, src+sO+24*16, 16); >> > + memcpy(dst+dO+25*dstLineSize, src+sO+25*16, 16); >> > + memcpy(dst+dO+26*dstLineSize, src+sO+26*16, 16); >> > + memcpy(dst+dO+27*dstLineSize, src+sO+27*16, 16); >> > + memcpy(dst+dO+28*dstLineSize, src+sO+28*16, 16); >> > + memcpy(dst+dO+29*dstLineSize, src+sO+29*16, 16); >> > + memcpy(dst+dO+30*dstLineSize, src+sO+30*16, 16); >> > + memcpy(dst+dO+31*dstLineSize, src+sO+31*16, 16); >> > + >> > + dX += tileW; >> > + if (dX >= w) { >> > + dX = 0; >> > + dY += 32; >> > + } >> > + sO = sO + 32*16; >> > + cTR += 32; >> > + } >> > +} >> > + >> > +static int filter_frame(AVFilterLink *inlink, AVFrame *in) >> > +{ >> > + AVFilterContext *ctx = inlink->dst; >> > + FBDetileContext *fbdetile = ctx->priv; >> > + AVFilterLink *outlink = ctx->outputs[0]; >> > + AVFrame *out; >> > + >> > + out = ff_get_video_buffer(outlink, outlink->w, outlink->h); >> > + if (!out) { >> > + av_frame_free(&in); >> > + return AVERROR(ENOMEM); >> > + } >> > + av_frame_copy_props(out, in); >> > + >> > + if (fbdetile->type == TYPE_INTELX) { >> > + detile_intelx(ctx, fbdetile->width, fbdetile->height, >> > + out->data[0], out->linesize[0], >> > + in->data[0], in->linesize[0]); >> > + } else if (fbdetile->type == TYPE_INTELY) { >> > + detile_intely(ctx, fbdetile->width, fbdetile->height, >> > + out->data[0], out->linesize[0], >> > + in->data[0], in->linesize[0]); >> > + } >> > + >> > + av_frame_free(&in); >> > + return ff_filter_frame(outlink, out); >> > +} >> > + >> > +static av_cold void uninit(AVFilterContext *ctx) >> > +{ >> > + >> > +} >> > + >> > +static const AVFilterPad fbdetile_inputs[] = { >> > + { >> > + .name = "default", >> > + .type = AVMEDIA_TYPE_VIDEO, >> > + .config_props = config_props, >> > + .filter_frame = filter_frame, >> > + }, >> > + { NULL } >> > +}; >> > + >> > +static const AVFilterPad fbdetile_outputs[] = { >> > + { >> > + .name = "default", >> > + .type = AVMEDIA_TYPE_VIDEO, >> > + }, >> > + { NULL } >> > +}; >> > + >> > +AVFilter ff_vf_fbdetile = { >> > + .name = "fbdetile", >> > + .description = NULL_IF_CONFIG_SMALL("Detile Framebuffer using >> CPU"), >> > + .priv_size = sizeof(FBDetileContext), >> > + .init = init, >> > + .uninit = uninit, >> > + .query_formats = query_formats, >> > + .inputs = fbdetile_inputs, >> > + .outputs = fbdetile_outputs, >> > + .priv_class = &fbdetile_class, >> > +}; >> > -- >> > 2.20.1 >> > >> > _______________________________________________ >> > ffmpeg-devel mailing list >> > ffmpeg-devel@ffmpeg.org >> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> > >> > To unsubscribe, visit link above, or email >> > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". >> > > > -- > Keep ;-) > HanishKVC > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".