On Sat, 22 Jun 2024 15:13:34 +0200 Niklas Haas <ffm...@haasn.xyz> wrote: > Hey, > > As some of you know, I got contracted (by STF 2024) to work on improving > swscale, over the course of the next couple of months. I want to share my > current plans and gather feedback + measure sentiment. > > ## Problem statement > > The two issues I'd like to focus on for now are: > > 1. Lack of support for a lot of modern formats and conversions (HDR, ICtCp, > IPTc2, BT.2020-CL, XYZ, YCgCo, Dolby Vision, ...) > 2. Complicated context management, with cascaded contexts, threading, stateful > configuration, multi-step init procedures, etc; and related bugs > > In order to make these feasible, some amount of internal re-organization of > duties inside swscale is prudent. > > ## Proposed approach > > The first step is to create a new API, which will (tentatively) live in > <libswscale/avscale.h>. This API will initially start off as a near-copy of > the > current swscale public API, but with the major difference that I want it to be > state-free and only access metadata in terms of AVFrame properties. So there > will be no independent configuration of the input chroma location etc. like > there is currently, and no need to re-configure or re-init the context when > feeding it frames with different properties. The goal is for users to be able > to just feed it AVFrame pairs and have it internally cache expensive > pre-processing steps as needed. Finally, avscale_* should ultimately also > support hardware frames directly, in which case it will dispatch to some > equivalent of scale_vulkan/vaapi/cuda or possibly even libplacebo. (But I will > defer this to a future milestone)
So, I've spent the past days implementing this API and hooking it up to swscale internally. (For testing, I am also replacing `vf_scale` by the equivalent AVScale-based implementation to see how the new API impacts existing users). It mostly works so far, with some left-over translation issues that I have to address before it can be sent upstream. ------ One of the things I was thinking about was how to configure scalers/dither modes, which sws currently, somewhat clunkily, controls with flags. IMO, flags are not the right design here - if anything, it should be a separate enum/int, and controllable separately for chroma resampling (4:4:4 <-> 4:2:0) and main scaling (e.g. 50x50 <-> 80x80). That said, I think that for most end users, having such fine-grained options is not really providing any end value - unless you're already knee-deep in signal theory, the actual differences between, say, "natural bicubic spline" and "Lanczos" are obtuse at best and alien at worst. My idea was to provide a single `int quality`, which the user can set to tune the speed <-> quality trade-off on an arbitrary numeric scale from 0 to 10, with 0 being the fastest (alias everything, nearest neighbour, drop half chroma samples, etc.), the default being something in the vicinity of 3-5, and 10 being the maximum quality (full linear downscaling, anti-aliasing, error diffusion, etc.). The upside of this approach is that it would be vastly simpler for most end users. It would also track newly added functionality automatically; e.g. if we get a higher-quality tone mapping mode, it can be retroactively added to the higher quality presets. The biggest downside I can think of is that doing this would arguably violate the semantics of a "bitexact" flag, since it would break results relative to a previous version of libswscale - unless we maybe also force a specific quality level in bitexact mode? Open questions: 1. Is this a good idea, or do the downsides outweigh the benefits? 2. Is an "advanced configuration" API still needed, in addition to the quality presets? ------ I have attached my current working draft of the public half of <avscale.h>, for reference. You can also find my implementation draft at the time of writing here: https://github.com/haasn/FFmpeg/blob/avscale/libswscale/avscale.h
/* * Copyright (C) 2024 Niklas Haas * * This file is part of FFmpeg. * * FFmpeg is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2.1 of the License, or (at your option) any later version. * * FFmpeg is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with FFmpeg; if not, write to the Free Software * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA */ #ifndef SWSCALE_AVSCALE_H #define SWSCALE_AVSCALE_H /** * @file * @ingroup libsws * Higher-level wrapper around libswscale + related libraries, which is * capable of handling more advanced colorspace transformations. */ #include "libavutil/frame.h" #include "libavutil/log.h" /** * Main external API structure. New fields cannot be added to the end with * minor version bumps. Removal, reordering and changes to existing fields * require a major version bump. sizeof(AVScaleContext) is not part of the ABI. */ typedef struct AVScaleContext { const AVClass *av_class; /** * Private context used for internal data. */ struct AVScaleInternal *internal; /** * Private data of the user, can be used to carry app specific stuff. */ void *opaque; /** * Bitmask of AV_SCALE_* flags. */ int64_t flags; /** * How many threads to use for processing, or 0 for automatic selection. */ int threads; /** * Quality factor (0-10). The default quality is [TBD]. Higher values * sacrifice speed in exchange for quality. * * TODO: explain what changes at each level */ int quality; } AVScaleContext; enum { /** * Force bit-exact output. This will prevent the use of platform-specific * optimizations that may lead to slight difference in rounding, in favor * of always maintaining exact bit output compatibility with the reference * C code. * * Note: This is also available under the name "accurate_rnd" for * backwards compatibility. */ AV_SCALE_BITEXACT = 1 << 0, /** * Return an error on underspecified conversions. Without this flag, * unspecified fields are defaulted to sensible values. */ AV_SCALE_STRICT = 1 << 1, }; /** * Allocate an AVScaleContext and set its fields to default values. The * resulting struct should be freed with avscale_free_context(). */ AVScaleContext *avscale_alloc_context(void); /** * Free the codec context and everything associated with it, and write NULL * to the provided pointer. */ void avscale_free_context(AVScaleContext **ctx); /** * Get the AVClass for AVScaleContext. It can be used in combination with * AV_OPT_SEARCH_FAKE_OBJ for examining options. * * @see av_opt_find(). */ const AVClass *avscale_get_class(void); /** * Statically test if a conversion is supported. Values of (respectively) * NONE/UNSPECIFIED are ignored. * * Returns 1 if the conversion is supported, or 0 otherwise. */ int avscale_test_format(enum AVPixelFormat dst, enum AVPixelFormat src); int avscale_test_colorspace(enum AVColorSpace dst, enum AVColorSpace src); int avscale_test_primaries(enum AVColorPrimaries dst, enum AVColorPrimaries src); int avscale_test_transfer(enum AVColorTransferCharacteristic dst, enum AVColorTransferCharacteristic src); /** * Scale source data from `src` and write the output to `dst`. This is * merely a convenience wrapper around `avscale_frame_slice(ctx, dst, src, 0, * src->height)`. * * @param ctx The scaling context. * @param dst The destination frame. * * The data buffers may either be already allocated by the caller * or left clear, in which case they will be allocated by the * scaler. The latter may have performance advantages - e.g. in * certain cases some (or all) output planes may be references to * input planes, rather than copies. * @param src The source frame. If the data buffers are set to NULL, then * this function performs no conversion. It will instead merely * initialize internal state that *would* be required to perform * the operation, as well as returing the correct error code for * unsupported frame combinations. * * @return 0 on success, a negative AVERROR code on failure. */ int avscale_frame(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src); /** * Like `avscale_frame`, but operates only on the (source) range from `ystart` * to `height`. * * Note: For interlaced or vertically subsampled frames, `ystart` and `height` * must be aligned to a multiple of the subsampling size (typically 2, or 4 in * the case of interlaced subsampled material). * * @param ctx The scaling context. * @param dst The destination frame. See avscale_framee() for more details. * @param src The source frame. See avscale_framee() for more details. * @param slice_start First row of slice, relative to `src` * @param slice_height Number of (source) rows in the slice * * @return 0 on success, a negative AVERROR code on failure. */ int avscale_frame_slice(AVScaleContext *ctx, AVFrame *dst, const AVFrame *src, int slice_start, int slice_height); #endif /* SWSCALE_AVSCALE_H */
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".