[FFmpeg-devel] [PATCH] doc/filters/libvmaf_cuda: fix a typo about the order of label
Hi We found a very simple typo in the libvmaf_cuda section of doc/filters where the order of the [dis] and [ref] labels is reversed. We believe fixing this typo would be benefical of those who are new to using ffmpeg and libvmaf_cuda. Thanks BR Han Shin >From 860ce5a597d1999d555ee8d8cf1eb76a8bfd40fb Mon Sep 17 00:00:00 2001 From: ha7sh17 Date: Mon, 22 Jul 2024 16:11:56 +0900 Subject: [PATCH] doc/filters/libvmaf_cuda: fix a typo about the order of label --- doc/filters.texi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/filters.texi b/doc/filters.texi index a43e4b8055..2585d818ff 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -17071,8 +17071,8 @@ ffmpeg \ -hwaccel cuda -hwaccel_output_format cuda -codec:v av1_cuvid -i dis.obu \ -hwaccel cuda -hwaccel_output_format cuda -codec:v av1_cuvid -i ref.obu \ -filter_complex " -[0:v]scale_cuda=format=yuv420p[ref]; \ -[1:v]scale_cuda=format=yuv420p[dis]; \ +[0:v]scale_cuda=format=yuv420p[dis]; \ +[1:v]scale_cuda=format=yuv420p[ref]; \ [dis][ref]libvmaf_cuda=log_fmt=json:log_path=output.json " \ -f null - -- 2.39.3 (Apple Git-145) fix-typo-in-doc-filters.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] doc/filters/libvmaf_cuda: fix a typo about the order of label
On 2024-07-22 12:53 pm, Shin Han wrote: Hi We found a very simple typo in the libvmaf_cuda section of doc/filters where the order of the [dis] and [ref] labels is reversed. We believe fixing this typo would be benefical of those who are new to using ffmpeg and libvmaf_cuda. Will apply. Thanks, Gyan ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] avutil/hwcontext_videotoolbox: Check CVBufferCopyAttachments during configure
On 22 Jul 2024, at 0:47, gnattu via ffmpeg-devel wrote: > The __builtin_available function does not do compile time check > for the availablity of the CVBufferCopyAttachments function > which will fail the build. Check the availability during configure. > > Signed-off-by: Gnattu OC > --- > configure | 2 ++ > libavutil/hwcontext_videotoolbox.c | 12 +--- > 2 files changed, 7 insertions(+), 7 deletions(-) > > diff --git a/configure b/configure > index f6f5c29fea..54171dd4e5 100755 > --- a/configure > +++ b/configure > @@ -2366,6 +2366,7 @@ SYSTEM_FUNCS=" > clock_gettime > closesocket > CommandLineToArgvW > +CVBufferCopyAttachments > fcntl > getaddrinfo > getauxval > @@ -6684,6 +6685,7 @@ enabled videotoolbox && { > check_func_headers CoreVideo/CVImageBuffer.h > kCVImageBufferColorPrimaries_ITU_R_2020 "-framework CoreVideo" > check_func_headers CoreVideo/CVImageBuffer.h > kCVImageBufferTransferFunction_ITU_R_2020 "-framework CoreVideo" > check_func_headers CoreVideo/CVImageBuffer.h > kCVImageBufferTransferFunction_SMPTE_ST_428_1 "-framework CoreVideo" > +check_func_headers CoreVideo/CVBuffer.h CVBufferCopyAttachments > "-framework CoreVideo" > } > Thanks for the fix and sorry for breaking this. I am not too keen on adding a configure check just for that, as if we do that for all partially available function it will make configure even slower than it already is. I have a fix for this ready testing for the right SDK conditionals (I only did not catch it before because the way the SDK annotations work nowadays do not seem to properly honor when I set my own __MAC_OS_X_VERSION_MAX_ALLOWED define…) so I will need to actually test this on an old enough SDK just to be sure, this time… Then again, if others prefer doing it with configure tests, I am ok with that as well. > enabled metal && test_cmd $metalcc -v || disable metal > diff --git a/libavutil/hwcontext_videotoolbox.c > b/libavutil/hwcontext_videotoolbox.c > index ab7556936d..c55d478004 100644 > --- a/libavutil/hwcontext_videotoolbox.c > +++ b/libavutil/hwcontext_videotoolbox.c > @@ -592,15 +592,13 @@ static int vt_pixbuf_set_colorspace(void *log_ctx, > (TARGET_OS_IOS && __IPHONE_OS_VERSION_MAX_ALLOWED >= 10) > if (__builtin_available(macOS 10.8, iOS 10, *)) { > CFDictionaryRef attachments = NULL; > +#if HAVE_CVBUFFERCOPYATTACHMENTS > if (__builtin_available(macOS 12.0, iOS 15.0, *)) > attachments = CVBufferCopyAttachments(pixbuf, > kCVAttachmentMode_ShouldPropagate); > -#if (TARGET_OS_OSX && __MAC_OS_X_VERSION_MIN_REQUIRED <= 12) || \ > -(TARGET_OS_IOS && __IPHONE_OS_VERSION_MIN_REQUIRED <= 15) > -else { > -CFDictionaryRef tmp = CVBufferGetAttachments(pixbuf, > kCVAttachmentMode_ShouldPropagate); > -if (tmp) > -attachments = CFDictionaryCreateCopy(NULL, tmp); > -} > +#else > +CFDictionaryRef tmp = CVBufferGetAttachments(pixbuf, > kCVAttachmentMode_ShouldPropagate); > +if (tmp) > +attachments = CFDictionaryCreateCopy(NULL, tmp); > #endif > if (attachments) { > colorspace = > CVImageBufferCreateColorSpaceFromAttachments(attachments); > -- > 2.39.3 (Apple Git-146) > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] Mono ADPCM for EA WVE Files / Fix Framerate
On Sat, Jul 20, 2024 at 10:19:18AM -0400, Aaron wrote: > On Fri, Jul 19, 2024 at 07:54:37PM -0400, Peter Ross wrote: > > > can you post a sample file somewhere > > Of course. Here are some you can try out: > > gm_us.wve: (Stereo - 15 FPS) https://0x0.st/X97A.wve > gm_sp.wve: (Stereo - 15 FPS) https://0x0.st/X97m.wve > sims.wve: (Mono - 30 FPS) https://0x0.st/X97b.wve > simsne.wve: (Mono - 30 FPS) https://0x0.st/X97B.wve > > The first two should decode no problem, and are attached for comparison > purposes. > > Strangely, for all except gm_us.wve, the very last SCDl chunk has a number > of samples that isn't divisible by 28. > So you'll get a single decode error at the end for the rest of them, but > they will still play all the way through and sound good. > > > these whitespace-only changes shouldn't go in the patch. > > "count2 < ( channels..." looks out of place. drop the space after the > parenthesis. > > ditto for these whitespace-only changes above. > > av_log trailing "\n" missing > > Thank you. I have cleaned those up now. > > > i suggest splitting this into two patches, one for mono adpcm ea, another > for the frame rate fix. > > Done. i will apply these. in future please send patches to the mailing list as separate emails, e.g. using git-mail, described here: https://ffmpeg.org//developer.html#toc-Submitting-patches-1 cheers, -- Peter (A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B) signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/3] lavc/ffv1dec: drop code handling AV_PIX_FMT_FLAG_PAL
No paletted pixel formats are supported by the decoder. --- libavcodec/ffv1dec.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c index 7dc4a537a9..aa2c35880e 100644 --- a/libavcodec/ffv1dec.c +++ b/libavcodec/ffv1dec.c @@ -981,10 +981,7 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *rframe, (sc->slice_y >> sv) + ((sc->slice_x >> sh) << pixshift); } -if (desc->flags & AV_PIX_FMT_FLAG_PAL) { -dst[1] = p->data[1]; -src[1] = f->last_picture.f->data[1]; -} + av_image_copy(dst, p->linesize, src, f->last_picture.f->linesize, avctx->pix_fmt, -- 2.43.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/3] lavc/ffv1: move damage handling code to decode_slice()
There is no reason to delay it and this is a more natural place for this code. --- libavcodec/ffv1dec.c | 53 ++-- 1 file changed, 27 insertions(+), 26 deletions(-) diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c index aa2c35880e..5821a4156a 100644 --- a/libavcodec/ffv1dec.c +++ b/libavcodec/ffv1dec.c @@ -290,7 +290,7 @@ static int decode_slice(AVCodecContext *c, void *arg) if ((p->flags & AV_FRAME_FLAG_KEY) || sc->slice_reset_contexts) { ff_ffv1_clear_slice_state(f, sc); } else if (sc->slice_damaged) { -return AVERROR_INVALIDDATA; +goto handle_damage; } width = sc->slice_width; @@ -347,6 +347,32 @@ static int decode_slice(AVCodecContext *c, void *arg) } } +handle_damage: +if (sc->slice_damaged && f->last_picture.f) { +const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(c->pix_fmt); +const uint8_t *src[4]; +uint8_t *dst[4]; + +ff_progress_frame_await(&f->last_picture, si); + +for (int j = 0; j < desc->nb_components; j++) { +int pixshift = desc->comp[j].depth > 8; +int sh = (j == 1 || j == 2) ? f->chroma_h_shift : 0; +int sv = (j == 1 || j == 2) ? f->chroma_v_shift : 0; +dst[j] = p->data[j] + p->linesize[j] * + (sc->slice_y >> sv) + ((sc->slice_x >> sh) << pixshift); +src[j] = f->last_picture.f->data[j] + f->last_picture.f->linesize[j] * + (sc->slice_y >> sv) + ((sc->slice_x >> sh) << pixshift); + +} + +av_image_copy(dst, p->linesize, src, + f->last_picture.f->linesize, + c->pix_fmt, + sc->slice_width, + sc->slice_height); +} + ff_progress_frame_report(&f->picture, si); return 0; @@ -964,31 +990,6 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *rframe, f->slice_count, sizeof(*f->slices)); -for (int i = f->slice_count - 1; i >= 0; i--) { -FFV1SliceContext *sc = &f->slices[i]; -if (sc->slice_damaged && f->last_picture.f) { -const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(avctx->pix_fmt); -const uint8_t *src[4]; -uint8_t *dst[4]; -ff_progress_frame_await(&f->last_picture, INT_MAX); -for (int j = 0; j < desc->nb_components; j++) { -int pixshift = desc->comp[j].depth > 8; -int sh = (j == 1 || j == 2) ? f->chroma_h_shift : 0; -int sv = (j == 1 || j == 2) ? f->chroma_v_shift : 0; -dst[j] = p->data[j] + p->linesize[j] * - (sc->slice_y >> sv) + ((sc->slice_x >> sh) << pixshift); -src[j] = f->last_picture.f->data[j] + f->last_picture.f->linesize[j] * - (sc->slice_y >> sv) + ((sc->slice_x >> sh) << pixshift); - -} - -av_image_copy(dst, p->linesize, src, - f->last_picture.f->linesize, - avctx->pix_fmt, - sc->slice_width, - sc->slice_height); -} -} ff_progress_frame_report(&f->picture, INT_MAX); ff_progress_frame_unref(&f->last_picture); -- 2.43.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/3] lavc/ffv1dec: fix races in accessing FFV1SliceContext.slice_damaged
That variable is shared between frame threads in the same defective way described in the previous commit. Fix it by adding a RefStruct-managed arrays of flags that is propagated across frame threads in the standard manner. Remove now-unused FFV1Context.fsrc --- libavcodec/ffv1.c| 2 ++ libavcodec/ffv1.h| 9 - libavcodec/ffv1dec.c | 26 +- 3 files changed, 23 insertions(+), 14 deletions(-) diff --git a/libavcodec/ffv1.c b/libavcodec/ffv1.c index 9c219b5ddb..333fb3d79b 100644 --- a/libavcodec/ffv1.c +++ b/libavcodec/ffv1.c @@ -214,6 +214,8 @@ av_cold int ff_ffv1_close(AVCodecContext *avctx) ff_refstruct_unref(&sc->plane); } +ff_refstruct_unref(&s->slice_damaged); + av_freep(&avctx->stats_out); for (j = 0; j < s->quant_table_count; j++) { av_freep(&s->initial_states[j]); diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h index edc3f6aef0..92c629c823 100644 --- a/libavcodec/ffv1.h +++ b/libavcodec/ffv1.h @@ -118,7 +118,6 @@ typedef struct FFV1Context { int64_t picture_number; int key_frame; ProgressFrame picture, last_picture; -struct FFV1Context *fsrc; const AVFrame *cur_enc_frame; int plane_count; @@ -148,6 +147,14 @@ typedef struct FFV1Context { int num_h_slices; FFV1SliceContext *slices; +/* RefStruct object, per-slice damage flags shared between frame threads. + * + * After a frame thread marks some slice as finished with + * ff_progress_frame_report(), the corresponding array element must not be + * accessed by this thread anymore, as from then on it is owned by the next + * thread. + */ +uint8_t *slice_damaged; } FFV1Context; int ff_ffv1_common_init(AVCodecContext *avctx); diff --git a/libavcodec/ffv1dec.c b/libavcodec/ffv1dec.c index 5821a4156a..a69f18e252 100644 --- a/libavcodec/ffv1dec.c +++ b/libavcodec/ffv1dec.c @@ -263,15 +263,10 @@ static int decode_slice(AVCodecContext *c, void *arg) const int si = sc - f->slices; GetBitContext gb; -if (f->fsrc && !(p->flags & AV_FRAME_FLAG_KEY) && f->last_picture.f) +if (!(p->flags & AV_FRAME_FLAG_KEY) && f->last_picture.f) ff_progress_frame_await(&f->last_picture, si); -if (f->fsrc) { -const FFV1SliceContext *scsrc = &f->fsrc->slices[si]; - -if (!(p->flags & AV_FRAME_FLAG_KEY)) -sc->slice_damaged |= scsrc->slice_damaged; -} +sc->slice_damaged |= f->slice_damaged[si]; sc->slice_rct_by_coef = 1; sc->slice_rct_ry_coef = 1; @@ -373,6 +368,8 @@ handle_damage: sc->slice_height); } +f->slice_damaged[si] = sc->slice_damaged; + ff_progress_frame_report(&f->picture, si); return 0; @@ -793,11 +790,14 @@ static int read_header(FFV1Context *f) return AVERROR_INVALIDDATA; } +ff_refstruct_unref(&f->slice_damaged); +f->slice_damaged = ff_refstruct_allocz(f->slice_count * sizeof(*f->slice_damaged)); +if (!f->slice_damaged) +return AVERROR(ENOMEM); + for (int j = 0; j < f->slice_count; j++) { FFV1SliceContext *sc = &f->slices[j]; -sc->slice_damaged = 0; - if (f->version == 2) { int sx = get_symbol(c, state, 0); int sy = get_symbol(c, state, 0); @@ -945,6 +945,8 @@ static int decode_frame(AVCodecContext *avctx, AVFrame *rframe, int trailer = 3 + 5*!!f->ec; int v; +sc->slice_damaged = 0; + if (i || f->version > 2) { if (trailer > buf_p - buf) v = INT_MAX; else v = AV_RB24(buf_p-trailer) + trailer; @@ -1039,8 +1041,6 @@ static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src) FFV1SliceContext *sc = &fdst->slices[i]; const FFV1SliceContext *sc0 = &fsrc->slices[i]; -sc->slice_damaged = sc0->slice_damaged; - ff_refstruct_replace(&sc->plane, sc0->plane); if (fsrc->version < 3) { @@ -1051,12 +1051,12 @@ static int update_thread_context(AVCodecContext *dst, const AVCodecContext *src) } } +ff_refstruct_replace(&fdst->slice_damaged, fsrc->slice_damaged); + av_assert1(fdst->max_slice_count == fsrc->max_slice_count); ff_progress_frame_replace(&fdst->picture, &fsrc->picture); -fdst->fsrc = fsrc; - return 0; } #endif -- 2.43.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] doc/filters/libvmaf_cuda: fix a typo about the order of label
On 2024-07-22 01:01 pm, Gyan Doshi wrote: On 2024-07-22 12:53 pm, Shin Han wrote: Hi We found a very simple typo in the libvmaf_cuda section of doc/filters where the order of the [dis] and [ref] labels is reversed. We believe fixing this typo would be benefical of those who are new to using ffmpeg and libvmaf_cuda. Will apply. Adjusted commit subject and pushed as 172da370e70a24c8528efead0b24053fc74e5648 Thanks, Gyan ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 3/4 v2] avcodec: add LCEVC decoding support via LCEVCdec
On 21/07/2024 23:53, James Almer wrote: > Signed-off-by: James Almer > --- > configure | 3 + > doc/general_contents.texi | 13 ++ > libavcodec/Makefile | 1 + > libavcodec/lcevcdec.c | 276 ++ > libavcodec/lcevcdec.h | 44 ++ > 5 files changed, 337 insertions(+) > create mode 100644 libavcodec/lcevcdec.c > create mode 100644 libavcodec/lcevcdec.h > > diff --git a/configure b/configure > index f6f5c29fea..d1f32684a6 100755 > --- a/configure > +++ b/configure > @@ -225,6 +225,7 @@ External library support: >--enable-libcdio enable audio CD grabbing with libcdio [no] >--enable-libcodec2 enable codec2 en/decoding using libcodec2 [no] >--enable-libdav1denable AV1 decoding via libdav1d [no] > + --enable-liblcevc_decenable LCEVC decoding via liblcevc_dec [no] >--enable-libdavs2enable AVS2 decoding via libdavs2 [no] >--enable-libdc1394 enable IIDC-1394 grabbing using libdc1394 > and libraw1394 [no] > @@ -1914,6 +1915,7 @@ EXTERNAL_LIBRARY_LIST=" > libcelt > libcodec2 > libdav1d > +liblcevc_dec > libdc1394 > libflite > libfontconfig > @@ -6854,6 +6856,7 @@ enabled libcelt && require libcelt > celt/celt.h celt_decode -lcelt0 && > enabled libcaca && require_pkg_config libcaca caca caca.h > caca_create_canvas > enabled libcodec2 && require libcodec2 codec2/codec2.h codec2_create > -lcodec2 > enabled libdav1d && require_pkg_config libdav1d "dav1d >= 0.5.0" > "dav1d/dav1d.h" dav1d_version > +enabled liblcevc_dec && require_pkg_config liblcevc_dec "lcevc_dec >= > 2.0.0" "LCEVC/lcevc_dec.h" LCEVC_CreateDecoder > enabled libdavs2 && require_pkg_config libdavs2 "davs2 >= 1.6.0" > davs2.h davs2_decoder_open > enabled libdc1394 && require_pkg_config libdc1394 libdc1394-2 > dc1394/dc1394.h dc1394_new > enabled libdrm&& check_pkg_config libdrm libdrm xf86drm.h > drmGetVersion > diff --git a/doc/general_contents.texi b/doc/general_contents.texi > index e7cf4f8239..ecaf3979ce 100644 > --- a/doc/general_contents.texi > +++ b/doc/general_contents.texi > @@ -245,6 +245,19 @@ Go to @url{https://github.com/google/liblc3/} and follow > the instructions for > installing the library. > Then pass @code{--enable-liblc3} to configure to enable it. > > +@section LCEVCdec > + > +FFmpeg can make use of the liblcevc_dec library for LCEVC enhacement layer > +decoding on supported bitstreams. > + > +Go to @url{https://github.com/v-novaltd/LCEVCdec} and follow the instructions > +for installing the library. Then pass @code{--enable-libvpx} to configure to ^ Should be --enable-liblcevc_dec > +enable it. > + > +@float NOTE > +LCEVCdec is under the BSD-3-Clause-Clear License. > +@end float > + > @section OpenH264 > > FFmpeg can make use of the OpenH264 library for H.264 decoding and encoding. > diff --git a/libavcodec/Makefile b/libavcodec/Makefile > index 771e2b597e..71bc3c8075 100644 > --- a/libavcodec/Makefile > +++ b/libavcodec/Makefile > @@ -121,6 +121,7 @@ OBJS-$(CONFIG_INTRAX8) += intrax8.o > intrax8dsp.o msmpeg4_vc1_dat > OBJS-$(CONFIG_IVIDSP) += ivi_dsp.o > OBJS-$(CONFIG_JNI) += ffjni.o jni.o > OBJS-$(CONFIG_JPEGTABLES) += jpegtables.o > +OBJS-$(CONFIG_LIBLCEVC_DEC)+= lcevcdec.o > OBJS-$(CONFIG_LCMS2) += fflcms2.o > OBJS-$(CONFIG_LLAUDDSP)+= lossless_audiodsp.o > OBJS-$(CONFIG_LLVIDDSP)+= lossless_videodsp.o > diff --git a/libavcodec/lcevcdec.c b/libavcodec/lcevcdec.c > new file mode 100644 > index 00..4edb0b72dc > --- /dev/null > +++ b/libavcodec/lcevcdec.c > @@ -0,0 +1,276 @@ > +/* > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 > USA > + */ > + > +#include "libavutil/avassert.h" > +#include "libavutil/frame.h" > +#include "libavutil/imgutils.h" > +#include "libavutil/log.h" > +#include "libavutil/mem.h" > +#include "decode.h" > +#include "lcevcdec.h" > + > +static LCEVC_Color
Re: [FFmpeg-devel] [PATCH v2] libavformat/vapoursynth: Update to API version 4, load library at runtime
On Mon, Jul 22, 2024 at 12:15 AM Hendrik Leppkes wrote: > On Mon, Jul 22, 2024 at 12:08 AM Stefan Oltmanns via ffmpeg-devel > wrote: > > > > Am 18.07.24 um 17:23 schrieb epira...@gmail.com: > > > > > >>> > > >>> Well, the DLL directory is added to PATH by the VapourSynth installer, > > >>> but for safety reasons ffmpeg explictly tells the LoadLibrary function > > >>> to only search the application directory and system32, quote from > > >>> w32dlfcn.h: > > >>> > > /** > > * Safe function used to open dynamic libs. This attempts to improve > > program security > > * by removing the current directory from the dll search path. Only > > dll's found in the > > * executable or system directory are allowed to be loaded. > > * @param name The dynamic lib name. > > * @return A handle to the opened lib. > > */ > > >>> So ffmpeg prevents that solution on purpose. Or should that behavior be > > >>> changed in the w32dlfcn.h? > > >> > > >> Oh, bummer. I would expect that overriding the PATH environment > > >> variable would work kind of like how LD_LIBRARY_PATH works on Linux. I > > >> don't know why that was changed. I don't really follow what goes on in > > >> Windowsland anymore. Does anyone else care to comment on this? Martin, > > >> maybe? > > > > > > IIRC this is done to prevent DLL injection attacks > > > > > > https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security > > > > > > > So what's your proposal how to continue? > > > > I see different options with pros&cons: > > > > > > 1. > > Read the DLL path from registry, function for that could be located > > outside the VapourSynth module. > > > > Pro: Safest method to protect from DLL-injection > > Con: A lot of custom code/functionality for Windows > > > > Relaxing security considerations to avoid a 10 line function seems not > worth it to me. So go with actually finding the correct path. I would prefer changing w32dlfcn.h to allow loading DLLs from PATH. Limiting to only the directory of the executable and system32 seems too excessive to me. Removing the current working directory is more understandable, but it's perfectly fine to expect PATH to be searched. Also, we don't have such restrictions on other platforms. (DY)LD_LIBRARY_PATH still work as expected. Ramiro ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/vvc: Add aarch64 neon optimization for ALF
On Tue, Jul 16, 2024 at 8:39 PM Nuo Mi wrote: > Hi Zhili, > Good job. Appreciate it. > With this patch, we're very close to smooth 4K@30 playback on my M2. > Applied. Thank you, Zhili, for our first Neon optimizations. > On Tue, Jul 16, 2024 at 12:19 AM Zhao Zhili > wrote: > >> From: Zhao Zhili >> >> vvc_alf_filter_chroma_4x4_8_c: 3.0 >> vvc_alf_filter_chroma_4x4_8_neon: 1.0 >> vvc_alf_filter_chroma_4x4_10_c: 2.7 >> vvc_alf_filter_chroma_4x4_10_neon: 1.0 >> vvc_alf_filter_chroma_4x4_12_c: 2.7 >> vvc_alf_filter_chroma_4x4_12_neon: 1.0 >> vvc_alf_filter_chroma_8x8_8_c: 10.2 >> vvc_alf_filter_chroma_8x8_8_neon: 3.0 >> vvc_alf_filter_chroma_8x8_10_c: 10.0 >> vvc_alf_filter_chroma_8x8_10_neon: 2.5 >> vvc_alf_filter_chroma_8x8_12_c: 10.0 >> vvc_alf_filter_chroma_8x8_12_neon: 2.5 >> vvc_alf_filter_chroma_16x16_8_c: 41.7 >> vvc_alf_filter_chroma_16x16_8_neon: 11.2 >> vvc_alf_filter_chroma_16x16_10_c: 39.0 >> vvc_alf_filter_chroma_16x16_10_neon: 10.0 >> vvc_alf_filter_chroma_16x16_12_c: 40.2 >> vvc_alf_filter_chroma_16x16_12_neon: 10.2 >> vvc_alf_filter_chroma_32x32_8_c: 162.0 >> vvc_alf_filter_chroma_32x32_8_neon: 45.0 >> vvc_alf_filter_chroma_32x32_10_c: 155.5 >> vvc_alf_filter_chroma_32x32_10_neon: 39.5 >> vvc_alf_filter_chroma_32x32_12_c: 155.5 >> vvc_alf_filter_chroma_32x32_12_neon: 40.0 >> vvc_alf_filter_chroma_64x64_8_c: 646.0 >> vvc_alf_filter_chroma_64x64_8_neon: 175.5 >> vvc_alf_filter_chroma_64x64_10_c: 708.2 >> vvc_alf_filter_chroma_64x64_10_neon: 166.7 >> vvc_alf_filter_chroma_64x64_12_c: 619.2 >> vvc_alf_filter_chroma_64x64_12_neon: 157.2 >> vvc_alf_filter_chroma_128x128_8_c: 2611.5 >> vvc_alf_filter_chroma_128x128_8_neon: 698.2 >> vvc_alf_filter_chroma_128x128_10_c: 2470.0 >> vvc_alf_filter_chroma_128x128_10_neon: 616.0 >> vvc_alf_filter_chroma_128x128_12_c: 2531.5 >> vvc_alf_filter_chroma_128x128_12_neon: 620.2 >> vvc_alf_filter_luma_8x8_8_c: 25.2 >> vvc_alf_filter_luma_8x8_8_neon: 4.2 >> vvc_alf_filter_luma_8x8_10_c: 18.5 >> vvc_alf_filter_luma_8x8_10_neon: 4.0 >> vvc_alf_filter_luma_8x8_12_c: 19.0 >> vvc_alf_filter_luma_8x8_12_neon: 4.0 >> vvc_alf_filter_luma_16x16_8_c: 106.5 >> vvc_alf_filter_luma_16x16_8_neon: 16.2 >> vvc_alf_filter_luma_16x16_10_c: 75.2 >> vvc_alf_filter_luma_16x16_10_neon: 14.7 >> vvc_alf_filter_luma_16x16_12_c: 79.7 >> vvc_alf_filter_luma_16x16_12_neon: 14.7 >> vvc_alf_filter_luma_32x32_8_c: 400.5 >> vvc_alf_filter_luma_32x32_8_neon: 63.2 >> vvc_alf_filter_luma_32x32_10_c: 299.2 >> vvc_alf_filter_luma_32x32_10_neon: 57.7 >> vvc_alf_filter_luma_32x32_12_c: 299.2 >> vvc_alf_filter_luma_32x32_12_neon: 57.7 >> vvc_alf_filter_luma_64x64_8_c: 1602.5 >> vvc_alf_filter_luma_64x64_8_neon: 251.7 >> vvc_alf_filter_luma_64x64_10_c: 1197.0 >> vvc_alf_filter_luma_64x64_10_neon: 235.5 >> vvc_alf_filter_luma_64x64_12_c: 1220.2 >> vvc_alf_filter_luma_64x64_12_neon: 235.7 >> vvc_alf_filter_luma_128x128_8_c: 6570.2 >> vvc_alf_filter_luma_128x128_8_neon: 1007.7 >> vvc_alf_filter_luma_128x128_10_c: 4822.7 >> vvc_alf_filter_luma_128x128_10_neon: 936.2 >> vvc_alf_filter_luma_128x128_12_c: 4791.2 >> vvc_alf_filter_luma_128x128_12_neon: 938.5 >> >> Signed-off-by: Zhao Zhili >> --- >> libavcodec/aarch64/vvc/Makefile | 5 + >> libavcodec/aarch64/vvc/alf.S | 293 ++ >> libavcodec/aarch64/vvc/alf_template.c | 157 ++ >> libavcodec/aarch64/vvc/dsp_init.c | 57 + >> libavcodec/vvc/dsp.c | 4 +- >> libavcodec/vvc/dsp.h | 1 + >> 6 files changed, 516 insertions(+), 1 deletion(-) >> create mode 100644 libavcodec/aarch64/vvc/Makefile >> create mode 100644 libavcodec/aarch64/vvc/alf.S >> create mode 100644 libavcodec/aarch64/vvc/alf_template.c >> create mode 100644 libavcodec/aarch64/vvc/dsp_init.c >> >> diff --git a/libavcodec/aarch64/vvc/Makefile >> b/libavcodec/aarch64/vvc/Makefile >> new file mode 100644 >> index 00..58398d6e3d >> --- /dev/null >> +++ b/libavcodec/aarch64/vvc/Makefile >> @@ -0,0 +1,5 @@ >> +clean:: >> + $(RM) $(CLEANSUFFIXES:%=libavcodec/aarch64/vvc/%) >> + >> +OBJS-$(CONFIG_VVC_DECODER) += >> aarch64/vvc/dsp_init.o >> +NEON-OBJS-$(CONFIG_VVC_DECODER)+= >> aarch64/vvc/alf.o >> diff --git a/libavcodec/aarch64/vvc/alf.S b/libavcodec/aarch64/vvc/alf.S >> new file mode 100644 >> index 00..beb36ac66b >> --- /dev/null >> +++ b/libavcodec/aarch64/vvc/alf.S >> @@ -0,0 +1,293 @@ >> +/* >> + * Copyright (c) 2024 Zhao Zhili >> + * >> + * This file is part of FFmpeg. >> + * >> + * FFmpeg is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU Lesser General Public >> + * License as published by the Free Software Foundation; either >> + * version 2.1 of the License, or (at your option) any later version. >> + * >> + * FFmpeg is distributed in the hope that it will be useful, >> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >> + * MERCHANTABILITY or FITNESS FOR A
Re: [FFmpeg-devel] [FFmpeg-cvslog] avcodec/vvc: Add aarch64 neon optimization for ALF
On Mon, 22 Jul 2024, Zhao Zhili wrote: ffmpeg | branch: master | Zhao Zhili | Tue Jul 16 00:19:15 2024 +0800| [2d4ef304c9e13f5e8abe37c20ddd0f17102c6393] | committer: Nuo Mi avcodec/vvc: Add aarch64 neon optimization for ALF http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2d4ef304c9e13f5e8abe37c20ddd0f17102c6393 --- libavcodec/aarch64/vvc/Makefile | 5 + libavcodec/aarch64/vvc/alf.S | 293 ++ libavcodec/aarch64/vvc/alf_template.c | 157 ++ libavcodec/aarch64/vvc/dsp_init.c | 57 +++ libavcodec/vvc/dsp.c | 4 +- libavcodec/vvc/dsp.h | 1 + 6 files changed, 516 insertions(+), 1 deletion(-) I didn't review this patch yet. I've been on vacation, and I was hoping to get to reviewing this (and other things) soon when I catch up. The patch hasn't even been on the mailing list for one single week! For areas where earlier reviews have required multiple iterations to get patches right, I would hope that we could wait for an actual review. // Martin ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] libavformat/vapoursynth: Update to API version 4, load library at runtime
On Mon, Jul 22, 2024 at 2:14 PM Ramiro Polla wrote: > > On Mon, Jul 22, 2024 at 12:15 AM Hendrik Leppkes wrote: > > On Mon, Jul 22, 2024 at 12:08 AM Stefan Oltmanns via ffmpeg-devel > > wrote: > > > > > > Am 18.07.24 um 17:23 schrieb epira...@gmail.com: > > > > > > > >>> > > > >>> Well, the DLL directory is added to PATH by the VapourSynth installer, > > > >>> but for safety reasons ffmpeg explictly tells the LoadLibrary function > > > >>> to only search the application directory and system32, quote from > > > >>> w32dlfcn.h: > > > >>> > > > /** > > > * Safe function used to open dynamic libs. This attempts to > > > improve program security > > > * by removing the current directory from the dll search path. Only > > > dll's found in the > > > * executable or system directory are allowed to be loaded. > > > * @param name The dynamic lib name. > > > * @return A handle to the opened lib. > > > */ > > > >>> So ffmpeg prevents that solution on purpose. Or should that behavior > > > >>> be > > > >>> changed in the w32dlfcn.h? > > > >> > > > >> Oh, bummer. I would expect that overriding the PATH environment > > > >> variable would work kind of like how LD_LIBRARY_PATH works on Linux. I > > > >> don't know why that was changed. I don't really follow what goes on in > > > >> Windowsland anymore. Does anyone else care to comment on this? Martin, > > > >> maybe? > > > > > > > > IIRC this is done to prevent DLL injection attacks > > > > > > > > https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security > > > > > > > > > > So what's your proposal how to continue? > > > > > > I see different options with pros&cons: > > > > > > > > > 1. > > > Read the DLL path from registry, function for that could be located > > > outside the VapourSynth module. > > > > > > Pro: Safest method to protect from DLL-injection > > > Con: A lot of custom code/functionality for Windows > > > > > > > Relaxing security considerations to avoid a 10 line function seems not > > worth it to me. So go with actually finding the correct path. > > I would prefer changing w32dlfcn.h to allow loading DLLs from PATH. > Limiting to only the directory of the executable and system32 seems > too excessive to me. Removing the current working directory is more > understandable, but it's perfectly fine to expect PATH to be searched. > This is common and largely expected DLL loading behavior on Windows. Changing that to avoid 10 lines of code is rather ill advised. - Hendrik ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] libavformat/vapoursynth: Update to API version 4, load library at runtime
Am 22.07.24 um 14:13 schrieb Ramiro Polla: On Mon, Jul 22, 2024 at 12:15 AM Hendrik Leppkes wrote: On Mon, Jul 22, 2024 at 12:08 AM Stefan Oltmanns via ffmpeg-devel wrote: Am 18.07.24 um 17:23 schrieb epira...@gmail.com: Well, the DLL directory is added to PATH by the VapourSynth installer, but for safety reasons ffmpeg explictly tells the LoadLibrary function to only search the application directory and system32, quote from w32dlfcn.h: /** * Safe function used to open dynamic libs. This attempts to improve program security * by removing the current directory from the dll search path. Only dll's found in the * executable or system directory are allowed to be loaded. * @param name The dynamic lib name. * @return A handle to the opened lib. */ So ffmpeg prevents that solution on purpose. Or should that behavior be changed in the w32dlfcn.h? Oh, bummer. I would expect that overriding the PATH environment variable would work kind of like how LD_LIBRARY_PATH works on Linux. I don't know why that was changed. I don't really follow what goes on in Windowsland anymore. Does anyone else care to comment on this? Martin, maybe? IIRC this is done to prevent DLL injection attacks https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security So what's your proposal how to continue? I see different options with pros&cons: 1. Read the DLL path from registry, function for that could be located outside the VapourSynth module. Pro: Safest method to protect from DLL-injection Con: A lot of custom code/functionality for Windows Relaxing security considerations to avoid a 10 line function seems not worth it to me. So go with actually finding the correct path. I would prefer changing w32dlfcn.h to allow loading DLLs from PATH. Limiting to only the directory of the executable and system32 seems too excessive to me. Removing the current working directory is more understandable, but it's perfectly fine to expect PATH to be searched. Also, we don't have such restrictions on other platforms. (DY)LD_LIBRARY_PATH still work as expected. I had a look at the documentation from LoadLibrary: That does not seem to be an option, you cannot explicitly allow PATH. If you want to allow PATH, you need the option for default search directories and that will also include the current working directory. You can use custom search paths (that's what ffmpeg does for older Windows where these flags don't exist), but those have to be explicit, so no help here. Best regards Stefan ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [FFmpeg-cvslog] avcodec/vvc: Add aarch64 neon optimization for ALF
On Mon, Jul 22, 2024 at 9:15 PM Martin Storsjö wrote: > On Mon, 22 Jul 2024, Zhao Zhili wrote: > > > ffmpeg | branch: master | Zhao Zhili | Tue Jul > 16 00:19:15 2024 +0800| [2d4ef304c9e13f5e8abe37c20ddd0f17102c6393] | > committer: Nuo Mi > > > > avcodec/vvc: Add aarch64 neon optimization for ALF > > > >> > http://git.videolan.org/gitweb.cgi/ffmpeg.git/?a=commit;h=2d4ef304c9e13f5e8abe37c20ddd0f17102c6393 > > --- > > > > libavcodec/aarch64/vvc/Makefile | 5 + > > libavcodec/aarch64/vvc/alf.S | 293 > ++ > > libavcodec/aarch64/vvc/alf_template.c | 157 ++ > > libavcodec/aarch64/vvc/dsp_init.c | 57 +++ > > libavcodec/vvc/dsp.c | 4 +- > > libavcodec/vvc/dsp.h | 1 + > > 6 files changed, 516 insertions(+), 1 deletion(-) > > I didn't review this patch yet. > > I've been on vacation, and I was hoping to get to reviewing this (and > other things) soon when I catch up. > > The patch hasn't even been on the mailing list for one single week! For > areas where earlier reviews have required multiple iterations to get > patches right, I would hope that we could wait for an actual review. > Hi Martin, Sorry for this.I will wait more time next time. Do you prefer to revert the patch temporarily? Thank you. > > // Martin > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avdevice/dshow: Don't skip audio devices if no video device is present
On Wed, Jul 17, 2024 at 1:43 AM patches wrote: > > -Original Message- > From: ffmpeg-devel On Behalf Of Roger Pack > Sent: Wednesday, July 17, 2024 3:03 AM > To: FFmpeg development discussions and patches > Subject: Re: [FFmpeg-devel] [PATCH] avdevice/dshow: Don't skip audio devices > if no video device is present > > > LGTM > > I also need this fix in 5.1 release branch, is this possible? Not sure how backports work, but you could make your own fork with it in it? > > > On Mon, Jul 15, 2024 at 12:51 AM patches via ffmpeg-devel > wrote: > > > > The search of the current DirectShow device list has been customized > > so that audio devices are always found even if no video device is connected. > > > > Signed-off-by: Jens Frederich > > --- > > libavdevice/dshow.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/libavdevice/dshow.c b/libavdevice/dshow.c index > > 403e56fe13..57d8e1c0af 100644 > > --- a/libavdevice/dshow.c > > +++ b/libavdevice/dshow.c > > @@ -645,7 +645,7 @@ static int dshow_get_device_list(AVFormatContext > > *avctx, AVDeviceInfoList *devic > > } > > > > ret = dshow_cycle_devices(avctx, devenum, VideoDevice, > > VideoSourceDevice, NULL, NULL, &device_list); > > -if (ret < S_OK) > > +if (ret < S_OK && ret != AVERROR(EIO)) > > goto error; > > ret = dshow_cycle_devices(avctx, devenum, AudioDevice, > > AudioSourceDevice, NULL, NULL, &device_list); > > > > -- > > 2.43.0 > > > > ___ > > ffmpeg-devel mailing list > > ffmpeg-devel@ffmpeg.org > > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > > To unsubscribe, visit link above, or email > > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org > with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 14/17] avdevice/dshow: Cleanup also on av_log case
On Tue, Jul 2, 2024 at 1:39 PM Michael Niedermayer wrote: > > On Mon, May 27, 2024 at 01:52:26AM +0200, Michael Niedermayer wrote: > > Fixes: CID1598550 Resource leak > > > > Sponsored-by: Sovereign Tech Fund > > Signed-off-by: Michael Niedermayer > > --- > > libavdevice/dshow.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > can someone with a clue about and means to test dshow review the dshow > patches of this set ? > i dont feel comfortable to just apply some of these with no testing > and no review They seem to not have broken anything, thanks for looking into it! :) ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] avisynth as an internal filter, any objections?
On Thu, Jul 18, 2024 at 6:16 AM Stefan Oltmanns via ffmpeg-devel wrote: > > AviSynth (or better VapourSynth) as filter sounds great, but is it possible? > The reason why input plugins (like FFmpegSource2) in > AviSynth/VapourSynth create an index file in a first pass is to allow > frame-accurate random access to the video. Also the exact number of > frames of a clip has to be known, because I could access that property > in VapourSynth. Yeah that's a good question, wonder how vsrawsource does it... related: https://forum.videohelp.com/threads/404486-Ffmpeg-piped-as-source-to-avisynth ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/4] lavc/vp9dsp: restrict vertical intra pointers
This lets the compiler unroll ever so slightly better (at least in the 16x16 case for RISC-V GCC). --- libavcodec/vp9dsp_template.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index 9b11661704..5c4fb5d6e2 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -30,7 +30,7 @@ // FIXME see whether we can merge parts of this (perhaps at least 4x4 and 8x8) // back with h264pred.[ch] -static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_4x4_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -44,7 +44,7 @@ static void vert_4x4_c(uint8_t *_dst, ptrdiff_t stride, AV_WN4PA(dst + stride * 3, p4); } -static void vert_8x8_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_8x8_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -61,7 +61,7 @@ static void vert_8x8_c(uint8_t *_dst, ptrdiff_t stride, } } -static void vert_16x16_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_16x16_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; @@ -82,7 +82,7 @@ static void vert_16x16_c(uint8_t *_dst, ptrdiff_t stride, } } -static void vert_32x32_c(uint8_t *_dst, ptrdiff_t stride, +static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, const uint8_t *left, const uint8_t *_top) { pixel *dst = (pixel *) _dst; -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/4] lavc/vp9dsp: use restrict qualifier for copy/avg MC
Same as previous commit. --- libavcodec/vp9dsp_template.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index 5c4fb5d6e2..da3cc28e5e 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -1936,9 +1936,9 @@ static av_cold void vp9dsp_loopfilter_init(VP9DSPContext *dsp) #if BIT_DEPTH != 12 -static av_always_inline void copy_c(uint8_t *dst, ptrdiff_t dst_stride, -const uint8_t *src, ptrdiff_t src_stride, -int w, int h) +static av_always_inline void copy_c(uint8_t *restrict dst, ptrdiff_t dst_stride, +const uint8_t *restrict src, +ptrdiff_t src_stride, int w, int h) { do { memcpy(dst, src, w * sizeof(pixel)); @@ -1948,9 +1948,9 @@ static av_always_inline void copy_c(uint8_t *dst, ptrdiff_t dst_stride, } while (--h); } -static av_always_inline void avg_c(uint8_t *_dst, ptrdiff_t dst_stride, - const uint8_t *_src, ptrdiff_t src_stride, - int w, int h) +static av_always_inline void avg_c(uint8_t *restrict _dst, ptrdiff_t dst_stride, + const uint8_t *restrict _src, + ptrdiff_t src_stride, int w, int h) { pixel *dst = (pixel *) _dst; const pixel *src = (const pixel *) _src; -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/4] lavc/vp9dsp: copy 8 pixels at once
In the 8-bit case, we can actually read/write 8 aligned pixel values per load/store, which unsurprisingly tends to be faster on 64-bit systems (and makes no differences on 32-bit systems). This requires ifdef'ing though. --- libavcodec/vp9dsp_template.c | 32 1 file changed, 32 insertions(+) diff --git a/libavcodec/vp9dsp_template.c b/libavcodec/vp9dsp_template.c index da3cc28e5e..9e5b25142d 100644 --- a/libavcodec/vp9dsp_template.c +++ b/libavcodec/vp9dsp_template.c @@ -49,14 +49,22 @@ static void vert_8x8_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 +uint64_t p8 = AV_RN64A(top); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 8; y++) { +#if BIT_DEPTH == 8 +AV_WN64A(dst, p8); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); +#endif dst += stride; } } @@ -66,18 +74,28 @@ static void vert_16x16_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 +uint64_t p8a = AV_RN64A(top); +uint64_t p8b = AV_RN64A(top + 8); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); pixel4 p4c = AV_RN4PA(top + 8); pixel4 p4d = AV_RN4PA(top + 12); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 16; y++) { +#if BIT_DEPTH == 8 +AV_WN64A(dst + 0, p8a); +AV_WN64A(dst + 8, p8b); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); AV_WN4PA(dst + 8, p4c); AV_WN4PA(dst + 12, p4d); +#endif dst += stride; } } @@ -87,6 +105,12 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, { pixel *dst = (pixel *) _dst; const pixel *top = (const pixel *) _top; +#if BIT_DEPTH == 8 +uint64_t p8a = AV_RN64A(top); +uint64_t p8b = AV_RN64A(top + 8); +uint64_t p8c = AV_RN64A(top + 16); +uint64_t p8d = AV_RN64A(top + 24); +#else pixel4 p4a = AV_RN4PA(top + 0); pixel4 p4b = AV_RN4PA(top + 4); pixel4 p4c = AV_RN4PA(top + 8); @@ -95,10 +119,17 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, pixel4 p4f = AV_RN4PA(top + 20); pixel4 p4g = AV_RN4PA(top + 24); pixel4 p4h = AV_RN4PA(top + 28); +#endif int y; stride /= sizeof(pixel); for (y = 0; y < 32; y++) { +#if BIT_DEPTH == 8 +AV_WN64A(dst + 0, p8a); +AV_WN64A(dst + 8, p8b); +AV_WN64A(dst + 16, p8c); +AV_WN64A(dst + 24, p8d); +#else AV_WN4PA(dst + 0, p4a); AV_WN4PA(dst + 4, p4b); AV_WN4PA(dst + 8, p4c); @@ -107,6 +138,7 @@ static void vert_32x32_c(uint8_t *restrict _dst, ptrdiff_t stride, AV_WN4PA(dst + 20, p4f); AV_WN4PA(dst + 24, p4g); AV_WN4PA(dst + 28, p4h); +#endif dst += stride; } } -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 4/4] lavc/vp9dsp: remove R-V I intra functions
At this point, they are identical to the C code, except for isntruction ordering. In fact, they are typically slower or no faster than the C code. (Also FWIW, they were incorrectly flagged as requiring fast unaligned memory accesses.) --- libavcodec/riscv/Makefile| 3 +- libavcodec/riscv/vp9_intra_rvi.S | 71 libavcodec/riscv/vp9dsp_init.c | 7 3 files changed, 1 insertion(+), 80 deletions(-) delete mode 100644 libavcodec/riscv/vp9_intra_rvi.S diff --git a/libavcodec/riscv/Makefile b/libavcodec/riscv/Makefile index 0bbdd38116..a6cdcb71e9 100644 --- a/libavcodec/riscv/Makefile +++ b/libavcodec/riscv/Makefile @@ -73,8 +73,7 @@ OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_init.o RV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvi.o RVV-OBJS-$(CONFIG_VP8DSP) += riscv/vp8dsp_rvv.o OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9dsp_init.o -RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvi.o \ - riscv/vp9_mc_rvi.o +RV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_mc_rvi.o RVV-OBJS-$(CONFIG_VP9_DECODER) += riscv/vp9_intra_rvv.o \ riscv/vp9_mc_rvv.o OBJS-$(CONFIG_VORBIS_DECODER) += riscv/vorbisdsp_init.o diff --git a/libavcodec/riscv/vp9_intra_rvi.S b/libavcodec/riscv/vp9_intra_rvi.S deleted file mode 100644 index 16b6bdb25a..00 --- a/libavcodec/riscv/vp9_intra_rvi.S +++ /dev/null @@ -1,71 +0,0 @@ -/* - * Copyright (c) 2024 Institue of Software Chinese Academy of Sciences (ISCAS). - * - * This file is part of FFmpeg. - * - * FFmpeg is free software; you can redistribute it and/or - * modify it under the terms of the GNU Lesser General Public - * License as published by the Free Software Foundation; either - * version 2.1 of the License, or (at your option) any later version. - * - * FFmpeg is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * Lesser General Public License for more details. - * - * You should have received a copy of the GNU Lesser General Public - * License along with FFmpeg; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA - */ - -#include "libavutil/riscv/asm.S" - -#if __riscv_xlen >= 64 -func ff_v_32x32_rvi -ld t0, (a3) -ld t1, 8(a3) -ld t2, 16(a3) -ld t3, 24(a3) -.rept 16 -add a7, a0, a1 -sd t0, (a0) -sd t1, 8(a0) -sd t2, 16(a0) -sd t3, 24(a0) -sh1add a0, a1, a0 -sd t0, (a7) -sd t1, 8(a7) -sd t2, 16(a7) -sd t3, 24(a7) -.endr - -ret -endfunc - -func ff_v_16x16_rvi -ld t0, (a3) -ld t1, 8(a3) -.rept 8 -add a7, a0, a1 -sd t0, (a0) -sd t1, 8(a0) -sh1add a0, a1, a0 -sd t0, (a7) -sd t1, 8(a7) -.endr - -ret -endfunc - -func ff_v_8x8_rvi -ld t0, (a3) -.rept 4 -add a7, a0, a1 -sd t0, (a0) -sh1add a0, a1, a0 -sd t0, (a7) -.endr - -ret -endfunc -#endif diff --git a/libavcodec/riscv/vp9dsp_init.c b/libavcodec/riscv/vp9dsp_init.c index 454dcd963f..2034e1c976 100644 --- a/libavcodec/riscv/vp9dsp_init.c +++ b/libavcodec/riscv/vp9dsp_init.c @@ -74,13 +74,6 @@ static av_cold void vp9dsp_intrapred_init_riscv(VP9DSPContext *dsp, int bpp) #if HAVE_RV int flags = av_get_cpu_flags(); -# if __riscv_xlen >= 64 -if (bpp == 8 && (flags & AV_CPU_FLAG_RVB_ADDR)) { -dsp->intra_pred[TX_32X32][VERT_PRED] = ff_v_32x32_rvi; -dsp->intra_pred[TX_16X16][VERT_PRED] = ff_v_16x16_rvi; -dsp->intra_pred[TX_8X8][VERT_PRED] = ff_v_8x8_rvi; -} -# endif #if HAVE_RVV if (bpp == 8 && flags & AV_CPU_FLAG_RVV_I64 && ff_rv_vlen_least(128)) { dsp->intra_pred[TX_8X8][DC_PRED] = ff_dc_8x8_rvv; -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] libavformat/vapoursynth: Update to API version 4, load library at runtime
On Mon, Jul 22, 2024 at 4:01 PM Hendrik Leppkes wrote: > On Mon, Jul 22, 2024 at 2:14 PM Ramiro Polla wrote: > > On Mon, Jul 22, 2024 at 12:15 AM Hendrik Leppkes > > wrote: > > > On Mon, Jul 22, 2024 at 12:08 AM Stefan Oltmanns via ffmpeg-devel > > > wrote: > > > > Am 18.07.24 um 17:23 schrieb epira...@gmail.com: > > > > >>> Well, the DLL directory is added to PATH by the VapourSynth > > > > >>> installer, > > > > >>> but for safety reasons ffmpeg explictly tells the LoadLibrary > > > > >>> function > > > > >>> to only search the application directory and system32, quote from > > > > >>> w32dlfcn.h: > > > > >>> > > > > /** > > > > * Safe function used to open dynamic libs. This attempts to > > > > improve program security > > > > * by removing the current directory from the dll search path. > > > > Only dll's found in the > > > > * executable or system directory are allowed to be loaded. > > > > * @param name The dynamic lib name. > > > > * @return A handle to the opened lib. > > > > */ > > > > >>> So ffmpeg prevents that solution on purpose. Or should that > > > > >>> behavior be > > > > >>> changed in the w32dlfcn.h? > > > > >> > > > > >> Oh, bummer. I would expect that overriding the PATH environment > > > > >> variable would work kind of like how LD_LIBRARY_PATH works on Linux. > > > > >> I > > > > >> don't know why that was changed. I don't really follow what goes on > > > > >> in > > > > >> Windowsland anymore. Does anyone else care to comment on this? > > > > >> Martin, > > > > >> maybe? > > > > > > > > > > IIRC this is done to prevent DLL injection attacks > > > > > > > > > > https://learn.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-security > > > > > > > > > > > > > So what's your proposal how to continue? > > > > > > > > I see different options with pros&cons: > > > > > > > > > > > > 1. > > > > Read the DLL path from registry, function for that could be located > > > > outside the VapourSynth module. > > > > > > > > Pro: Safest method to protect from DLL-injection > > > > Con: A lot of custom code/functionality for Windows > > > > > > > > > > Relaxing security considerations to avoid a 10 line function seems not > > > worth it to me. So go with actually finding the correct path. > > > > I would prefer changing w32dlfcn.h to allow loading DLLs from PATH. > > Limiting to only the directory of the executable and system32 seems > > too excessive to me. Removing the current working directory is more > > understandable, but it's perfectly fine to expect PATH to be searched. > > This is common and largely expected DLL loading behavior on Windows. I was surprised by this statement, but it seems that the expectations on Windows have indeed changed over the last decade. I guess I've just been away for too long... Ramiro ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/9] lavu/riscv: allow any number of extensions
This reworks the func/endfunc macros to support any number of ISA extension as parameters. --- libavutil/riscv/asm.S | 17 + 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 2cf4f7b7ab..78e9defbd4 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -36,17 +36,18 @@ #define HWD #endif -.macro func sym, ext1=, ext2= +.macro archadd ext=, more:vararg +.ifnb \ext +.option arch, +\ext +archadd \more +.endif +.endm + +.macro func sym, exts:vararg .text .align 2 - .option push -.ifnb \ext1 -.option arch, +\ext1 -.ifnb \ext2 -.option arch, +\ext2 -.endif -.endif +archadd \exts .global \sym .hidden \sym -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/9] lavu/riscv: grok B as an extension
The RISC-V B bit manipulation extension was ratified only two months ago. But it is strictly equivalent to the union of the zba, zbb and zbs extensions which were defined almost 3 years earlier. Rather than require new assembler, we can just match the extension name manually and translate it into its constituent parts. --- libavutil/riscv/asm.S | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 78e9defbd4..0c29680d84 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -38,7 +38,12 @@ .macro archadd ext=, more:vararg .ifnb \ext -.option arch, +\ext +.ifc\ext, b +# B was defined later, is known to fewer assemblers. +archadd zba, zbb, zbs +.else +.option arch, +\ext +.endif archadd \more .endif .endm -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 3/9] lavc/riscv: require B or zba explicitly
--- libavcodec/riscv/aacencdsp_rvv.S | 4 +-- libavcodec/riscv/aacpsdsp_rvv.S| 10 +++ libavcodec/riscv/ac3dsp_rvv.S | 6 ++-- libavcodec/riscv/ac3dsp_rvvb.S | 2 +- libavcodec/riscv/alacdsp_rvv.S | 6 ++-- libavcodec/riscv/audiodsp_rvv.S| 6 ++-- libavcodec/riscv/bswapdsp_rvb.S| 2 +- libavcodec/riscv/bswapdsp_rvv.S| 2 +- libavcodec/riscv/exrdsp_rvv.S | 2 +- libavcodec/riscv/fixed_vtype.S | 48 ++ libavcodec/riscv/flacdsp_rvv.S | 42 +- libavcodec/riscv/fmtconvert_rvv.S | 4 +-- libavcodec/riscv/h264_mc_chroma.S | 4 +-- libavcodec/riscv/h264idct_rvv.S| 2 +- libavcodec/riscv/huffyuvdsp_rvv.S | 4 +-- libavcodec/riscv/jpeg2000dsp_rvv.S | 4 +-- libavcodec/riscv/llauddsp_rvv.S| 4 +-- libavcodec/riscv/lpc_rvv.S | 4 +-- libavcodec/riscv/opusdsp_rvv.S | 2 +- libavcodec/riscv/rv40dsp_rvv.S | 4 +-- libavcodec/riscv/sbrdsp_rvv.S | 16 +- libavcodec/riscv/svqenc_rvv.S | 2 +- libavcodec/riscv/takdsp_rvv.S | 8 ++--- libavcodec/riscv/utvideodsp_rvv.S | 4 +-- libavcodec/riscv/vc1dsp_rvv.S | 6 ++-- libavcodec/riscv/vorbisdsp_rvv.S | 2 +- libavcodec/riscv/vp7dsp_rvv.S | 2 +- libavcodec/riscv/vp8dsp_rvv.S | 4 +-- libavcodec/riscv/vp9_intra_rvi.S | 6 ++-- 29 files changed, 129 insertions(+), 83 deletions(-) create mode 100644 libavcodec/riscv/fixed_vtype.S diff --git a/libavcodec/riscv/aacencdsp_rvv.S b/libavcodec/riscv/aacencdsp_rvv.S index 21e66a77ae..05a603b6f6 100644 --- a/libavcodec/riscv/aacencdsp_rvv.S +++ b/libavcodec/riscv/aacencdsp_rvv.S @@ -21,7 +21,7 @@ #include "libavutil/riscv/asm.S" -func ff_abs_pow34_rvv, zve32f +func ff_abs_pow34_rvv, zve32f, zba 1: vsetvli t0, a2, e32, m8, ta, ma sub a2, a2, t0 @@ -38,7 +38,7 @@ func ff_abs_pow34_rvv, zve32f ret endfunc -func ff_aac_quant_bands_rvv, zve32f +func ff_aac_quant_bands_rvv, zve32f, zba NOHWF fmv.w.x fa0, a6 NOHWF fmv.w.x fa1, a7 fcvt.s.wft0, a5 diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index 2d6858688a..72e2103c22 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -20,7 +20,7 @@ #include "libavutil/riscv/asm.S" -func ff_ps_add_squares_rvv, zve64f +func ff_ps_add_squares_rvv, zve64f, zba li t1, 32 1: vsetvli t0, a2, e32, m4, ta, ma @@ -39,7 +39,7 @@ func ff_ps_add_squares_rvv, zve64f ret endfunc -func ff_ps_mul_pair_single_rvv, zve32f +func ff_ps_mul_pair_single_rvv, zve32f, zba 1: vsetvli t0, a3, e32, m4, ta, ma vlseg2e32.v v24, (a1) @@ -134,7 +134,7 @@ NOHWD flw fs\n, (4 * \n)(sp) .purgem filter endfunc -func ff_ps_hybrid_analysis_ileave_rvv, zve32x /* no needs for zve32f here */ +func ff_ps_hybrid_analysis_ileave_rvv, zve32x /* no zve32f here */, zba sllit0, a2, 5 + 1 + 2 // ctz(32 * 2 * 4) sh2add a1, a2, a1 add a0, a0, t0 @@ -169,7 +169,7 @@ func ff_ps_hybrid_analysis_ileave_rvv, zve32x /* no needs for zve32f here */ ret endfunc -func ff_ps_hybrid_synthesis_deint_rvv, zve64x +func ff_ps_hybrid_synthesis_deint_rvv, zve64x, zba sllit0, a2, 5 + 1 + 2 sh2add a0, a2, a0 add a1, a1, t0 @@ -207,7 +207,7 @@ func ff_ps_hybrid_synthesis_deint_rvv, zve64x ret endfunc -func ff_ps_stereo_interpolate_rvv, zve32f, zbb +func ff_ps_stereo_interpolate_rvv, zve32f, b vsetvli t0, zero, e32, m2, ta, ma vid.vv24 flw ft0, (a2) diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S index 1b5f67a9ec..c733733286 100644 --- a/libavcodec/riscv/ac3dsp_rvv.S +++ b/libavcodec/riscv/ac3dsp_rvv.S @@ -43,7 +43,7 @@ func ff_ac3_exponent_min_rvv, zve32x ret endfunc -func ff_float_to_fixed24_rvv, zve32f +func ff_float_to_fixed24_rvv, zve32f, zba lit1, 1 << 24 fcvt.s.w f0, t1 1: @@ -61,7 +61,7 @@ func ff_float_to_fixed24_rvv, zve32f endfunc #if __riscv_xlen >= 64 -func ff_sum_square_butterfly_int32_rvv, zve64x +func ff_sum_square_butterfly_int32_rvv, zve64x, zba vsetvlit0, zero, e64, m8, ta, ma vmv.v.xv0, zero vmv.v.xv8, zero @@ -101,7 +101,7 @@ func ff_sum_square_butterfly_int32_rvv, zve64x endfunc #endif -func ff_sum_square_butterfly_float_rvv, zve32f +func ff_sum_square_butterfly_float_rvv, zve32f, zba vsetvli t0, zero, e32, m8, ta, ma vmv.v.x v0, zero vmv.v.x v8, zero diff --git a/libavcodec/riscv/ac3dsp_rvvb.S b/libavcodec/riscv/ac3dsp_rvvb.S index 64766b56be..5bffb40bba 100644 --- a/libavcodec/riscv/ac3dsp_rvvb.S +++ b/libavcodec/riscv/ac3dsp_rvvb.S @@ -21,7 +21,7 @@ #include "config.h" #include "libavutil/riscv/asm.S" -func ff_extr
[FFmpeg-devel] [PATCH 4/9] lavfi/riscv: require B or zba explicitly
--- libavfilter/riscv/af_afir_rvv.S | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavfilter/riscv/af_afir_rvv.S b/libavfilter/riscv/af_afir_rvv.S index 04ec2e50d8..2107d97166 100644 --- a/libavfilter/riscv/af_afir_rvv.S +++ b/libavfilter/riscv/af_afir_rvv.S @@ -21,7 +21,7 @@ #include "libavutil/riscv/asm.S" // void ff_fcmul_add(float *sum, const float *t, const float *c, int len) -func ff_fcmul_add_rvv, zve64f +func ff_fcmul_add_rvv, zve64f, zba li t1, 32 1: vsetvli t0, a3, e32, m4, ta, ma -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 5/9] sws/riscv: require B or zba explicitly
--- libswscale/riscv/input_rvv.S | 12 ++-- libswscale/riscv/range_rvv.S | 8 libswscale/riscv/rgb2rgb_rvb.S | 2 +- libswscale/riscv/rgb2rgb_rvv.S | 12 ++-- 4 files changed, 17 insertions(+), 17 deletions(-) diff --git a/libswscale/riscv/input_rvv.S b/libswscale/riscv/input_rvv.S index 1d7de59c66..d07db43b55 100644 --- a/libswscale/riscv/input_rvv.S +++ b/libswscale/riscv/input_rvv.S @@ -26,7 +26,7 @@ func ff_bgr24ToY_rvv, zve32x j 1f endfunc -func ff_rgb24ToY_rvv, zve32x +func ff_rgb24ToY_rvv, zve32x, zba lw t1, 0(a5) # RY lw t3, 8(a5) # BY 1: @@ -62,7 +62,7 @@ func ff_bgr24ToUV_rvv, zve32x j 1f endfunc -func ff_rgb24ToUV_rvv, zve32x +func ff_rgb24ToUV_rvv, zve32x, zba lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -108,7 +108,7 @@ func ff_bgr24ToUV_half_rvv, zve32x j 1f endfunc -func ff_rgb24ToUV_half_rvv, zve32x +func ff_rgb24ToUV_half_rvv, zve32x, zba lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -157,7 +157,7 @@ func ff_\chr1\()ToY_rvv, zve32x j 1f endfunc -func ff_\chr0\()ToY_rvv, zve32x +func ff_\chr0\()ToY_rvv, zve32x, zba lw t1, 0(a5) # RY lw t3, 8(a5) # BY 1: @@ -199,7 +199,7 @@ func ff_\chr1\()ToUV_rvv, zve32x j 1f endfunc -func ff_\chr0\()ToUV_rvv, zve32x +func ff_\chr0\()ToUV_rvv, zve32x, zba lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -251,7 +251,7 @@ func ff_\chr1\()ToUV_half_rvv, zve32x j 1f endfunc -func ff_\chr0\()ToUV_half_rvv, zve32x +func ff_\chr0\()ToUV_half_rvv, zve32x, zba lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU diff --git a/libswscale/riscv/range_rvv.S b/libswscale/riscv/range_rvv.S index 9da80e6199..19a74eba79 100644 --- a/libswscale/riscv/range_rvv.S +++ b/libswscale/riscv/range_rvv.S @@ -20,7 +20,7 @@ #include "libavutil/riscv/asm.S" -func ff_range_lum_to_jpeg_16_rvv, zve32x +func ff_range_lum_to_jpeg_16_rvv, zve32x, zba li t1, 30189 li t2, 19077 li t3, -39057361 @@ -41,7 +41,7 @@ func ff_range_lum_to_jpeg_16_rvv, zve32x ret endfunc -func ff_range_lum_from_jpeg_16_rvv, zve32x +func ff_range_lum_from_jpeg_16_rvv, zve32x, zba li t1, 14071 li t2, 33561947 1: @@ -60,7 +60,7 @@ func ff_range_lum_from_jpeg_16_rvv, zve32x ret endfunc -func ff_range_chr_to_jpeg_16_rvv, zve32x +func ff_range_chr_to_jpeg_16_rvv, zve32x, zba li t1, 30775 li t2, 4663 li t3, -9289992 @@ -88,7 +88,7 @@ func ff_range_chr_to_jpeg_16_rvv, zve32x ret endfunc -func ff_range_chr_from_jpeg_16_rvv, zve32x +func ff_range_chr_from_jpeg_16_rvv, zve32x, zba li t1, 1799 li t2, 4081085 1: diff --git a/libswscale/riscv/rgb2rgb_rvb.S b/libswscale/riscv/rgb2rgb_rvb.S index af127b32ed..d18e5ba01b 100644 --- a/libswscale/riscv/rgb2rgb_rvb.S +++ b/libswscale/riscv/rgb2rgb_rvb.S @@ -23,7 +23,7 @@ #include "libavutil/riscv/bswap_rvb.S" #if (__riscv_xlen >= 64) -func ff_shuffle_bytes_3210_rvb, zbb +func ff_shuffle_bytes_3210_rvb, zba, zbb srlia2, a2, 2 bswap32_rvb a1, a0, a2 endfunc diff --git a/libswscale/riscv/rgb2rgb_rvv.S b/libswscale/riscv/rgb2rgb_rvv.S index 19f7aaf67d..e1270ac0df 100644 --- a/libswscale/riscv/rgb2rgb_rvv.S +++ b/libswscale/riscv/rgb2rgb_rvv.S @@ -25,7 +25,7 @@ func ff_shuffle_bytes_0321_rvv, zve32x j 1f endfunc -func ff_shuffle_bytes_2103_rvv, zve32x +func ff_shuffle_bytes_2103_rvv, zve32x, zba li t1, ~0x00ff00ff 1: not t2, t1 @@ -54,7 +54,7 @@ func ff_shuffle_bytes_1230_rvv, zve32x j 3f endfunc -func ff_shuffle_bytes_3012_rvv, zve32x +func ff_shuffle_bytes_3012_rvv, zve32x, zba li t1, 8 li t2, 24 3: @@ -74,7 +74,7 @@ func ff_shuffle_bytes_3012_rvv, zve32x ret endfunc -func ff_interleave_bytes_rvv, zve32x +func ff_interleave_bytes_rvv, zve32x, zba 1: mv t0, a0 mv t1, a1 @@ -100,7 +100,7 @@ func ff_interleave_bytes_rvv, zve32x ret endfunc -func ff_deinterleave_bytes_rvv, zve32x +func ff_deinterleave_bytes_rvv, zve32x, zba 1: mv t0, a0 mv t1, a1 @@ -165,10 +165,10 @@ endfunc ret .endm -func ff_uyvytoyuv422_rvv, zve32x, zbb +func ff_uyvytoyuv422_rvv, zve32x, b yuy2_to_i422p v20, v16 endfunc -func ff_yuyvtoyuv422_rvv, zve32x, zbb +func ff_yuyvtoyuv422_rvv, zve32x, b yuy2_to_i422p v16, v20 endfunc -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit
[FFmpeg-devel] [PATCH 6/9] lavu/riscv: require B or zba explicitly
--- libavutil/riscv/fixed_dsp_rvv.S | 14 +++--- libavutil/riscv/float_dsp_rvv.S | 24 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/libavutil/riscv/fixed_dsp_rvv.S b/libavutil/riscv/fixed_dsp_rvv.S index 6bac5813b8..0fa6aab3d4 100644 --- a/libavutil/riscv/fixed_dsp_rvv.S +++ b/libavutil/riscv/fixed_dsp_rvv.S @@ -20,7 +20,7 @@ #include "asm.S" -func ff_vector_fmul_window_scaled_rvv, zve64x +func ff_vector_fmul_window_scaled_rvv, zve64x, zba csrwi vxrm, 0 vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a4, a2 @@ -68,7 +68,7 @@ func ff_vector_fmul_window_scaled_rvv, zve64x ret endfunc -func ff_vector_fmul_window_fixed_rvv, zve64x +func ff_vector_fmul_window_fixed_rvv, zve64x, zba csrwi vxrm, 0 vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a4, a2 @@ -112,7 +112,7 @@ func ff_vector_fmul_window_fixed_rvv, zve64x ret endfunc -func ff_vector_fmul_fixed_rvv, zve32x +func ff_vector_fmul_fixed_rvv, zve32x, zba csrwi vxrm, 0 1: vsetvli t0, a3, e32, m4, ta, ma @@ -129,7 +129,7 @@ func ff_vector_fmul_fixed_rvv, zve32x ret endfunc -func ff_vector_fmul_reverse_fixed_rvv, zve32x +func ff_vector_fmul_reverse_fixed_rvv, zve32x, zba csrwi vxrm, 0 // e16/m4 and e32/m8 are possible but slow the gathers down. vsetvli t0, zero, e16, m1, ta, ma @@ -155,7 +155,7 @@ func ff_vector_fmul_reverse_fixed_rvv, zve32x ret endfunc -func ff_vector_fmul_add_fixed_rvv, zve32x +func ff_vector_fmul_add_fixed_rvv, zve32x, zba csrwi vxrm, 0 1: vsetvli t0, a4, e32, m8, ta, ma @@ -175,7 +175,7 @@ func ff_vector_fmul_add_fixed_rvv, zve32x ret endfunc -func ff_scalarproduct_fixed_rvv, zve64x +func ff_scalarproduct_fixed_rvv, zve64x, zba li t1, 1 << 30 vsetvli t0, zero, e64, m8, ta, ma vmv.v.x v8, zero @@ -198,7 +198,7 @@ func ff_scalarproduct_fixed_rvv, zve64x endfunc // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1] -func ff_butterflies_fixed_rvv, zve32x +func ff_butterflies_fixed_rvv, zve32x, zba 1: vsetvli t0, a2, e32, m4, ta, ma vle32.v v16, (a0) diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S index 2f0ade6db6..c7744cf0e8 100644 --- a/libavutil/riscv/float_dsp_rvv.S +++ b/libavutil/riscv/float_dsp_rvv.S @@ -21,7 +21,7 @@ #include "asm.S" // (a0) = (a1) * (a2) [0..a3-1] -func ff_vector_fmul_rvv, zve32f +func ff_vector_fmul_rvv, zve32f, zba 1: vsetvli t0, a3, e32, m8, ta, ma vle32.v v16, (a1) @@ -38,7 +38,7 @@ func ff_vector_fmul_rvv, zve32f endfunc // (a0) += (a1) * fa0 [0..a2-1] -func ff_vector_fmac_scalar_rvv, zve32f +func ff_vector_fmac_scalar_rvv, zve32f, zba NOHWF fmv.w.x fa0, a2 NOHWF mva2, a3 1: @@ -57,7 +57,7 @@ NOHWF mva2, a3 endfunc // (a0) = (a1) * fa0 [0..a2-1] -func ff_vector_fmul_scalar_rvv, zve32f +func ff_vector_fmul_scalar_rvv, zve32f, zba NOHWF fmv.w.x fa0, a2 NOHWF mv a2, a3 1: @@ -73,7 +73,7 @@ NOHWF mv a2, a3 ret endfunc -func ff_vector_fmul_window_rvv, zve32f +func ff_vector_fmul_window_rvv, zve32f, zba // a0: dst, a1: src0, a2: src1, a3: window, a4: length // e16/m2 and e32/m4 are possible but slower due to gather. vsetvlit0, zero, e16, m1, ta, ma @@ -113,7 +113,7 @@ func ff_vector_fmul_window_rvv, zve32f endfunc // (a0) = (a1) * (a2) + (a3) [0..a4-1] -func ff_vector_fmul_add_rvv, zve32f +func ff_vector_fmul_add_rvv, zve32f, zba 1: vsetvli t0, a4, e32, m8, ta, ma vle32.v v8, (a1) @@ -133,7 +133,7 @@ endfunc // TODO factor vrsub, separate last iteration? // (a0) = (a1) * reverse(a2) [0..a3-1] -func ff_vector_fmul_reverse_rvv, zve32f +func ff_vector_fmul_reverse_rvv, zve32f, zba // e16/m4 and e32/m8 are possible but slower due to gather. vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a3, a2 @@ -159,7 +159,7 @@ func ff_vector_fmul_reverse_rvv, zve32f endfunc // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1] -func ff_butterflies_float_rvv, zve32f +func ff_butterflies_float_rvv, zve32f, zba 1: vsetvli t0, a2, e32, m8, ta, ma vle32.v v16, (a0) @@ -177,7 +177,7 @@ func ff_butterflies_float_rvv, zve32f endfunc // a0 = (a0).(a1) [0..a2-1] -func ff_scalarproduct_float_rvv, zve32f +func ff_scalarproduct_float_rvv, zve32f, zba vsetvli t0, zero, e32, m8, ta, ma vmv.v.x v8, zero vmv.s.x v0, zero @@ -199,7 +199,7 @@ NOHWF fmv.x.w a0, fa0 endfunc // (a0) = (a1) * (a2) [0..a3-1] -func ff_vector_dmul_rvv, zve64d +func ff_vector_dmul_rvv, zve64d, zba 1: vsetvli t0, a3, e64, m8, ta, ma vle64.v v16, (a1) @@ -216,7 +216,7 @@ func ff_vector_dmul_rvv, zve64d endfunc // (a0) += (a1) * fa0 [0..a2-1] -func ff_vector_dmac_scalar_rvv, zve64d +
[FFmpeg-devel] [PATCH 7/9] lavu/riscv: remove bespoke SH{1, 2, 3}ADD assembler
configure checks that the assembler supports the B extension (or rather its constituents) anyway. These macros were dodging sanity checks for unsupported instructions and nothing else. --- libavutil/riscv/asm.S | 19 --- 1 file changed, 19 deletions(-) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 0c29680d84..8b96e07b75 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -83,25 +83,6 @@ .endm .endm -#if !defined (__riscv_zba) -/* SH{1,2,3}ADD definitions for pre-Zba assemblers */ -.macro shnadd n, rd, rs1, rs2 -.insn r OP, 2 * \n, 16, \rd, \rs1, \rs2 -.endm - -.macro sh1add rd, rs1, rs2 -shnadd 1, \rd, \rs1, \rs2 -.endm - -.macro sh2add rd, rs1, rs2 -shnadd 2, \rd, \rs1, \rs2 -.endm - -.macro sh3add rd, rs1, rs2 -shnadd 3, \rd, \rs1, \rs2 -.endm -#endif - #if defined (__riscv_v_elen) # define RV_V_ELEN __riscv_v_elen #else -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 8/9] lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing: - Zba (addresses), - Zbb (basics) and - Zbs (single bits). It does not include Zbc (base-2 polynomials). --- doc/APIchanges| 3 +++ libavutil/cpu.c | 1 + libavutil/cpu.h | 1 + libavutil/riscv/cpu.c | 13 + libavutil/tests/cpu.c | 1 + tests/checkasm/checkasm.c | 1 + 6 files changed, 20 insertions(+) diff --git a/doc/APIchanges b/doc/APIchanges index 5751216b24..0061b084b8 100644 --- a/doc/APIchanges +++ b/doc/APIchanges @@ -2,6 +2,9 @@ The last version increases of all libraries were on 2024-03-07 API changes, most recent first: +2024-07-22 - x - lavu 59.18.100 - cpu.h + Add AV_CPU_FLAG_RVB. + 2024-07-xx - xx - lavf 61 - avformat.h Deprecate avformat_transfer_internal_stream_timing_info() and av_stream_get_codec_timebase() without replacement. diff --git a/libavutil/cpu.c b/libavutil/cpu.c index 9ac2f01c20..17afe8858a 100644 --- a/libavutil/cpu.c +++ b/libavutil/cpu.c @@ -186,6 +186,7 @@ int av_parse_cpu_caps(unsigned *flags, const char *s) { "rvi", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVI },.unit = "flags" }, { "rvf", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVF },.unit = "flags" }, { "rvd", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVD },.unit = "flags" }, +{ "rvb", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVB },.unit = "flags" }, { "zve32x", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_I32 },.unit = "flags" }, { "zve32f", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_F32 },.unit = "flags" }, { "zve64x", NULL, 0, AV_OPT_TYPE_CONST, { .i64 = AV_CPU_FLAG_RVV_I64 },.unit = "flags" }, diff --git a/libavutil/cpu.h b/libavutil/cpu.h index a25901433e..9f419aae02 100644 --- a/libavutil/cpu.h +++ b/libavutil/cpu.h @@ -92,6 +92,7 @@ #define AV_CPU_FLAG_RVB_ADDR (1 << 8) ///< Address bit-manipulations #define AV_CPU_FLAG_RV_ZVBB (1 << 9) ///< Vector basic bit-manipulations #define AV_CPU_FLAG_RV_MISALIGNED (1 <<10) ///< Fast misaligned accesses +#define AV_CPU_FLAG_RVB (1 <<11) ///< B (bit manipulations) /** * Return the flags which specify extensions supported by the CPU. diff --git a/libavutil/riscv/cpu.c b/libavutil/riscv/cpu.c index 04ac404bbf..e035f4b024 100644 --- a/libavutil/riscv/cpu.c +++ b/libavutil/riscv/cpu.c @@ -72,6 +72,12 @@ int ff_get_cpu_flags_riscv(void) #ifdef RISCV_HWPROBE_EXT_ZBB if (pairs[1].value & RISCV_HWPROBE_EXT_ZBB) ret |= AV_CPU_FLAG_RVB_BASIC; +#if defined (RISCV_HWPROBE_EXT_ZBA) && defined (RISCV_HWPROBE_EXT_ZBS) +if ((pairs[1].value & RISCV_HWPROBE_EXT_ZBA) && +(pairs[1].value & RISCV_HWPROBE_EXT_ZBB) && +(pairs[1].value & RISCV_HWPROBE_EXT_ZBS)) +ret |= AV_CPU_FLAG_RVB; +#endif #endif #ifdef RISCV_HWPROBE_EXT_ZVBB if (pairs[1].value & RISCV_HWPROBE_EXT_ZVBB) @@ -94,6 +100,9 @@ int ff_get_cpu_flags_riscv(void) ret |= AV_CPU_FLAG_RVF; if (hwcap & HWCAP_RV('D')) ret |= AV_CPU_FLAG_RVD; +if (hwcap & HWCAP_RV('B')) +ret |= AV_CPU_FLAG_RVB_ADDR | AV_CPU_FLAG_RVB_BASIC | + AV_CPU_FLAG_RVB; /* The V extension implies all Zve* functional subsets */ if (hwcap & HWCAP_RV('V')) @@ -118,6 +127,10 @@ int ff_get_cpu_flags_riscv(void) #ifdef __riscv_zbb ret |= AV_CPU_FLAG_RVB_BASIC; #endif +#if defined (__riscv_b) || \ +(defined (__riscv_zba) && defined (__riscv_zbb) && defined (__riscv_zbs)) +ret |= AV_CPU_FLAG_RVB; +#endif /* If RV-V is enabled statically at compile-time, check the details. */ #ifdef __riscv_vector diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c index 02b98682e3..b4b11775d8 100644 --- a/libavutil/tests/cpu.c +++ b/libavutil/tests/cpu.c @@ -90,6 +90,7 @@ static const struct { { AV_CPU_FLAG_RVD, "rvd"}, { AV_CPU_FLAG_RVB_ADDR, "zba"}, { AV_CPU_FLAG_RVB_BASIC, "zbb"}, +{ AV_CPU_FLAG_RVB, "rvb"}, { AV_CPU_FLAG_RVV_I32, "zve32x" }, { AV_CPU_FLAG_RVV_F32, "zve32f" }, { AV_CPU_FLAG_RVV_I64, "zve64x" }, diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index de0024099a..016f2329b0 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -295,6 +295,7 @@ static const struct { { "RVD", "rvd", AV_CPU_FLAG_RVD }, { "RVBaddr", "rvb_a",AV_CPU_FLAG_RVB_ADDR }, { "RVBbasic", "rvb_b",AV_CPU_FLAG_RVB_BASIC }, +{ "RVB", "rvb", AV_CPU_FLAG_RVB }, { "RVVi32", "rvv_i32", AV_CPU_FLAG_RVV_I32 }, { "RVVf32", "rvv_f32", AV_CPU_FLAG_RVV_F32 }, { "RVVi64", "rvv_i64", AV_CPU_FLAG_RVV_I64 }, -- 2.45.2
[FFmpeg-devel] [PATCH 9/9] lavc/h264dsp: use RISC-V B extension
This saves one register and one instruction per transform. add16 and add16intra thus become stack-less. --- libavcodec/riscv/h264dsp_init.c | 25 libavcodec/riscv/h264idct_rvv.S | 51 - 2 files changed, 38 insertions(+), 38 deletions(-) diff --git a/libavcodec/riscv/h264dsp_init.c b/libavcodec/riscv/h264dsp_init.c index 9ae182151c..836c073559 100644 --- a/libavcodec/riscv/h264dsp_init.c +++ b/libavcodec/riscv/h264dsp_init.c @@ -98,13 +98,14 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, dsp->h264_idct_add = ff_h264_idct_add_8_rvv; dsp->h264_idct8_add = ff_h264_idct8_add_8_rvv; +dsp->h264_idct_dc_add = ff_h264_idct4_dc_add_8_rvv; +if (flags & AV_CPU_FLAG_RVB) { +dsp->h264_idct_add16 = ff_h264_idct_add16_8_rvv; +dsp->h264_idct_add16intra = ff_h264_idct_add16intra_8_rvv; # if __riscv_xlen == 64 -dsp->h264_idct_add16 = ff_h264_idct_add16_8_rvv; -dsp->h264_idct_add16intra = ff_h264_idct_add16intra_8_rvv; -dsp->h264_idct8_add4 = ff_h264_idct8_add4_8_rvv; +dsp->h264_idct8_add4 = ff_h264_idct8_add4_8_rvv; # endif -if (flags & AV_CPU_FLAG_RVV_I32) -dsp->h264_idct_dc_add = ff_h264_idct4_dc_add_8_rvv; +} if (flags & AV_CPU_FLAG_RVV_I64) { dsp->h264_add_pixels8_clear = ff_h264_add_pixels8_8_rvv; dsp->h264_idct8_dc_add = ff_h264_idct8_dc_add_8_rvv; @@ -118,16 +119,16 @@ av_cold void ff_h264dsp_init_riscv(H264DSPContext *dsp, const int bit_depth, dsp->h264_idct_add = ff_h264_idct_add_##depth##_rvv; \ if (flags & AV_CPU_FLAG_RVB_ADDR) \ dsp->h264_idct8_add = ff_h264_idct8_add_##depth##_rvv; \ -if (zvl128b && (flags & AV_CPU_FLAG_RVB_ADDR)) { \ +if (zvl128b && (flags & AV_CPU_FLAG_RVB)) { \ dsp->h264_idct_dc_add = ff_h264_idct4_dc_add_##depth##_rvv; \ dsp->h264_idct8_dc_add = ff_h264_idct8_dc_add_##depth##_rvv; \ +if (__riscv_xlen == 64) { \ +dsp->h264_idct_add16 = ff_h264_idct_add16_##depth##_rvv; \ +dsp->h264_idct_add16intra = \ +ff_h264_idct_add16intra_##depth##_rvv; \ +} \ } \ -if (__riscv_xlen == 64 && zvl128b) { \ -dsp->h264_idct_add16 = ff_h264_idct_add16_##depth##_rvv; \ -dsp->h264_idct_add16intra = \ -ff_h264_idct_add16intra_##depth##_rvv; \ -} \ -if (__riscv_xlen == 64 && (flags & AV_CPU_FLAG_RVB_ADDR)) \ +if (__riscv_xlen == 64 && (flags & AV_CPU_FLAG_RVB)) \ dsp->h264_idct8_add4 = ff_h264_idct8_add4_##depth##_rvv; \ } diff --git a/libavcodec/riscv/h264idct_rvv.S b/libavcodec/riscv/h264idct_rvv.S index 514c849bce..a49a32c47e 100644 --- a/libavcodec/riscv/h264idct_rvv.S +++ b/libavcodec/riscv/h264idct_rvv.S @@ -532,16 +532,11 @@ const ff_h264_scan8 .byte 034, 035, 044, 045, 036, 037, 046, 047 endconst -#if (__riscv_xlen == 64) .macro idct4_adds type, depth -func ff_h264_idct_add\type\()_\depth\()_rvv, zve32x +func ff_h264_idct_add\type\()_\depth\()_rvv, zve32x, b csrwi vxrm, 0 -addisp, sp, -16 lla t0, ff_h264_scan8 -sd s0, (sp) li t1, 32 * (\depth / 8) -mv s0, sp -sd ra, 8(sp) vsetivli zero, 16, e8, m1, ta, ma vle8.vv8, (t0) .if \depth == 8 @@ -567,20 +562,23 @@ func ff_h264_idct_add\type\()_\depth\()_rvv, zve32x vsetvli zero, zero, e16, m2, ta, ma vmv.x.s a4, v0 vmv.x.s a7, v1 +zext.h a4, a4 +sllia7, a7, 16 mv t4, a0 +or a4, a4, a7 mv t5, a1 mv a1, a2 mv a2, a3 li a3, 16 +mv a7, ra 1: andit0, a4, 1 addia3, a3, -1 -srlia4, a4, 1 .ifc \type, 16 beqzt0, 3f # if (nnz) .endif lw t2, (t5) # block_offset[i] -andit1, a7, 1 +bexti t1, a4, 16 add a0, t4, t2 .ifc \type, 16 bnezt1, 2f # if (nnz == 1 && block[i * 16]) @@ -595,14 +593,12 @@ func ff_h264_idct_add\type\()_\depth\()_rvv, zve32x .endif jal ff_h264_idct4_dc_add_\depth\()_rvv 3: -srlia7, a7, 1 +srlia4, a4, 1 addit5, t5, 4 addia1, a1, 16 * 2 * (\depth / 8) bneza3, 1b -ld ra, 8(sp) -ld s0, 0(sp) -addisp, sp, 16 +mv ra, a7 ret endfunc .endm @@ -611,9 +607,10 @@ endfunc idct4_adds 16, \depth idct4_adds 16intra, \depth -func ff_h264_idct8_add4_\depth\()_
[FFmpeg-devel] [PATCH 1/6] lavu/riscv: assembly for zicfilp LPAD
This instruction, if aligned on a 4-byte boundary, defines a valid target ("landing pad") for an indirect call or jump. Since this instruction is a HINT, it is safe to assemble even if not included in the target instruction set architecture. The necessary alignment is already provided by the `func` macro. However this still lacks the ELF attribute to indicate that the zicfilp is supported in simple mode. This is left for future work as the ELF specification is not ratified as of yet. This will also nonobviously require the assembler to support zicfilp, insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as its temporary register. --- libavutil/riscv/asm.S | 6 ++ 1 file changed, 6 insertions(+) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 2cf4f7b7ab..37fd7d3b03 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -77,6 +77,12 @@ .endm .endm +#if !defined (__riscv_zicfilp) +.macro lpadlpl +auipc zero, \lpl +.endm +#endif + #if !defined (__riscv_zba) /* SH{1,2,3}ADD definitions for pre-Zba assemblers */ .macro shnadd n, rd, rs1, rs2 -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/6] lavu/riscv: add forward-edge CFI landing pads
--- libavutil/riscv/fixed_dsp_rvv.S | 6 ++ libavutil/riscv/float_dsp_rvv.S | 12 libavutil/riscv/lls_rvv.S | 1 + 3 files changed, 19 insertions(+) diff --git a/libavutil/riscv/fixed_dsp_rvv.S b/libavutil/riscv/fixed_dsp_rvv.S index 6bac5813b8..7a872f7763 100644 --- a/libavutil/riscv/fixed_dsp_rvv.S +++ b/libavutil/riscv/fixed_dsp_rvv.S @@ -21,6 +21,7 @@ #include "asm.S" func ff_vector_fmul_window_scaled_rvv, zve64x +lpad0 csrwi vxrm, 0 vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a4, a2 @@ -69,6 +70,7 @@ func ff_vector_fmul_window_scaled_rvv, zve64x endfunc func ff_vector_fmul_window_fixed_rvv, zve64x +lpad0 csrwi vxrm, 0 vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a4, a2 @@ -113,6 +115,7 @@ func ff_vector_fmul_window_fixed_rvv, zve64x endfunc func ff_vector_fmul_fixed_rvv, zve32x +lpad0 csrwi vxrm, 0 1: vsetvli t0, a3, e32, m4, ta, ma @@ -156,6 +159,7 @@ func ff_vector_fmul_reverse_fixed_rvv, zve32x endfunc func ff_vector_fmul_add_fixed_rvv, zve32x +lpad0 csrwi vxrm, 0 1: vsetvli t0, a4, e32, m8, ta, ma @@ -176,6 +180,7 @@ func ff_vector_fmul_add_fixed_rvv, zve32x endfunc func ff_scalarproduct_fixed_rvv, zve64x +lpad0 li t1, 1 << 30 vsetvli t0, zero, e64, m8, ta, ma vmv.v.x v8, zero @@ -199,6 +204,7 @@ endfunc // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1] func ff_butterflies_fixed_rvv, zve32x +lpad0 1: vsetvli t0, a2, e32, m4, ta, ma vle32.v v16, (a0) diff --git a/libavutil/riscv/float_dsp_rvv.S b/libavutil/riscv/float_dsp_rvv.S index 2f0ade6db6..e738268549 100644 --- a/libavutil/riscv/float_dsp_rvv.S +++ b/libavutil/riscv/float_dsp_rvv.S @@ -22,6 +22,7 @@ // (a0) = (a1) * (a2) [0..a3-1] func ff_vector_fmul_rvv, zve32f +lpad0 1: vsetvli t0, a3, e32, m8, ta, ma vle32.v v16, (a1) @@ -39,6 +40,7 @@ endfunc // (a0) += (a1) * fa0 [0..a2-1] func ff_vector_fmac_scalar_rvv, zve32f +lpad0 NOHWF fmv.w.x fa0, a2 NOHWF mva2, a3 1: @@ -58,6 +60,7 @@ endfunc // (a0) = (a1) * fa0 [0..a2-1] func ff_vector_fmul_scalar_rvv, zve32f +lpad0 NOHWF fmv.w.x fa0, a2 NOHWF mv a2, a3 1: @@ -74,6 +77,7 @@ NOHWF mv a2, a3 endfunc func ff_vector_fmul_window_rvv, zve32f +lpad0 // a0: dst, a1: src0, a2: src1, a3: window, a4: length // e16/m2 and e32/m4 are possible but slower due to gather. vsetvlit0, zero, e16, m1, ta, ma @@ -114,6 +118,7 @@ endfunc // (a0) = (a1) * (a2) + (a3) [0..a4-1] func ff_vector_fmul_add_rvv, zve32f +lpad0 1: vsetvli t0, a4, e32, m8, ta, ma vle32.v v8, (a1) @@ -134,6 +139,7 @@ endfunc // TODO factor vrsub, separate last iteration? // (a0) = (a1) * reverse(a2) [0..a3-1] func ff_vector_fmul_reverse_rvv, zve32f +lpad0 // e16/m4 and e32/m8 are possible but slower due to gather. vsetvli t0, zero, e16, m1, ta, ma sh2add a2, a3, a2 @@ -160,6 +166,7 @@ endfunc // (a0) = (a0) + (a1), (a1) = (a0) - (a1) [0..a2-1] func ff_butterflies_float_rvv, zve32f +lpad0 1: vsetvli t0, a2, e32, m8, ta, ma vle32.v v16, (a0) @@ -178,6 +185,7 @@ endfunc // a0 = (a0).(a1) [0..a2-1] func ff_scalarproduct_float_rvv, zve32f +lpad0 vsetvli t0, zero, e32, m8, ta, ma vmv.v.x v8, zero vmv.s.x v0, zero @@ -200,6 +208,7 @@ endfunc // (a0) = (a1) * (a2) [0..a3-1] func ff_vector_dmul_rvv, zve64d +lpad0 1: vsetvli t0, a3, e64, m8, ta, ma vle64.v v16, (a1) @@ -217,6 +226,7 @@ endfunc // (a0) += (a1) * fa0 [0..a2-1] func ff_vector_dmac_scalar_rvv, zve64d +lpad0 NOHWD fmv.d.x fa0, a2 NOHWD mva2, a3 1: @@ -235,6 +245,7 @@ endfunc // (a0) = (a1) * fa0 [0..a2-1] func ff_vector_dmul_scalar_rvv, zve64d +lpad0 NOHWD fmv.d.x fa0, a2 NOHWD mv a2, a3 1: @@ -251,6 +262,7 @@ NOHWD mv a2, a3 endfunc func ff_scalarproduct_double_rvv, zve64f +lpad0 vsetvli t0, zero, e64, m8, ta, ma vmv.v.x v8, zero vmv.s.x v0, zero diff --git a/libavutil/riscv/lls_rvv.S b/libavutil/riscv/lls_rvv.S index a36055bd7a..bd9f74ee5f 100644 --- a/libavutil/riscv/lls_rvv.S +++ b/libavutil/riscv/lls_rvv.S @@ -21,6 +21,7 @@ #include "asm.S" func ff_lls_update_covariance_rvv, zve64d, zbb +lpad0 vtype_vli t0, a2, t1, e64, ta, ma vsetvlzero, a2, t0 vle64.v v8, (a1) -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmp
[FFmpeg-devel] [PATCH 3/6] lavc/riscv: add forward-edge CFI landing pads
--- libavcodec/riscv/aacencdsp_rvv.S | 2 ++ libavcodec/riscv/aacpsdsp_rvv.S| 5 + libavcodec/riscv/ac3dsp_rvb.S | 2 ++ libavcodec/riscv/ac3dsp_rvv.S | 4 libavcodec/riscv/ac3dsp_rvvb.S | 1 + libavcodec/riscv/alacdsp_rvv.S | 3 +++ libavcodec/riscv/audiodsp_rvf.S| 1 + libavcodec/riscv/audiodsp_rvv.S| 2 ++ libavcodec/riscv/blockdsp_rvv.S| 4 libavcodec/riscv/bswapdsp_rvb.S| 1 + libavcodec/riscv/bswapdsp_rvv.S| 1 + libavcodec/riscv/exrdsp_rvv.S | 1 + libavcodec/riscv/flacdsp_rvv.S | 22 -- libavcodec/riscv/fmtconvert_rvv.S | 2 ++ libavcodec/riscv/g722dsp_rvv.S | 1 + libavcodec/riscv/h263dsp_rvv.S | 2 ++ libavcodec/riscv/h264_mc_chroma.S | 8 libavcodec/riscv/h264addpx_rvv.S | 4 libavcodec/riscv/h264dsp_rvv.S | 5 + libavcodec/riscv/h264idct_rvv.S| 16 libavcodec/riscv/huffyuvdsp_rvv.S | 2 ++ libavcodec/riscv/idctdsp_rvv.S | 3 +++ libavcodec/riscv/jpeg2000dsp_rvv.S | 2 ++ libavcodec/riscv/llauddsp_rvv.S| 2 ++ libavcodec/riscv/llviddsp_rvv.S| 1 + libavcodec/riscv/llvidencdsp_rvv.S | 1 + libavcodec/riscv/lpc_rvv.S | 2 ++ libavcodec/riscv/me_cmp_rvv.S | 17 + libavcodec/riscv/opusdsp_rvv.S | 1 + libavcodec/riscv/pixblockdsp_rvi.S | 2 ++ libavcodec/riscv/pixblockdsp_rvv.S | 4 libavcodec/riscv/rv34dsp_rvv.S | 2 ++ libavcodec/riscv/rv40dsp_rvv.S | 4 libavcodec/riscv/sbrdsp_rvv.S | 13 +++-- libavcodec/riscv/startcode_rvb.S | 1 + libavcodec/riscv/startcode_rvv.S | 1 + libavcodec/riscv/svqenc_rvv.S | 1 + libavcodec/riscv/takdsp_rvv.S | 4 libavcodec/riscv/utvideodsp_rvv.S | 2 ++ libavcodec/riscv/vc1dsp_rvi.S | 2 ++ libavcodec/riscv/vc1dsp_rvv.S | 11 +++ libavcodec/riscv/vorbisdsp_rvv.S | 1 + libavcodec/riscv/vp7dsp_rvv.S | 3 +++ libavcodec/riscv/vp8dsp_rvi.S | 3 +++ libavcodec/riscv/vp8dsp_rvv.S | 12 libavcodec/riscv/vp9_intra_rvi.S | 3 +++ libavcodec/riscv/vp9_intra_rvv.S | 7 +++ libavcodec/riscv/vp9_mc_rvi.S | 5 + libavcodec/riscv/vp9_mc_rvv.S | 1 + 49 files changed, 196 insertions(+), 4 deletions(-) diff --git a/libavcodec/riscv/aacencdsp_rvv.S b/libavcodec/riscv/aacencdsp_rvv.S index 21e66a77ae..e9e776dc9b 100644 --- a/libavcodec/riscv/aacencdsp_rvv.S +++ b/libavcodec/riscv/aacencdsp_rvv.S @@ -22,6 +22,7 @@ #include "libavutil/riscv/asm.S" func ff_abs_pow34_rvv, zve32f +lpad0 1: vsetvli t0, a2, e32, m8, ta, ma sub a2, a2, t0 @@ -39,6 +40,7 @@ func ff_abs_pow34_rvv, zve32f endfunc func ff_aac_quant_bands_rvv, zve32f +lpad0 NOHWF fmv.w.x fa0, a6 NOHWF fmv.w.x fa1, a7 fcvt.s.wft0, a5 diff --git a/libavcodec/riscv/aacpsdsp_rvv.S b/libavcodec/riscv/aacpsdsp_rvv.S index 2d6858688a..6d01bfb734 100644 --- a/libavcodec/riscv/aacpsdsp_rvv.S +++ b/libavcodec/riscv/aacpsdsp_rvv.S @@ -21,6 +21,7 @@ #include "libavutil/riscv/asm.S" func ff_ps_add_squares_rvv, zve64f +lpad0 li t1, 32 1: vsetvli t0, a2, e32, m4, ta, ma @@ -40,6 +41,7 @@ func ff_ps_add_squares_rvv, zve64f endfunc func ff_ps_mul_pair_single_rvv, zve32f +lpad0 1: vsetvli t0, a3, e32, m4, ta, ma vlseg2e32.v v24, (a1) @@ -57,6 +59,7 @@ func ff_ps_mul_pair_single_rvv, zve32f endfunc func ff_ps_hybrid_analysis_rvv, zve32f +lpad0 /* We need 26 FP registers, for 20 scratch ones. Spill fs0-fs5. */ addisp, sp, -48 .irp n, 0, 1, 2, 3, 4, 5 @@ -135,6 +138,7 @@ NOHWD flw fs\n, (4 * \n)(sp) endfunc func ff_ps_hybrid_analysis_ileave_rvv, zve32x /* no needs for zve32f here */ +lpad0 sllit0, a2, 5 + 1 + 2 // ctz(32 * 2 * 4) sh2add a1, a2, a1 add a0, a0, t0 @@ -208,6 +212,7 @@ func ff_ps_hybrid_synthesis_deint_rvv, zve64x endfunc func ff_ps_stereo_interpolate_rvv, zve32f, zbb +lpad0 vsetvli t0, zero, e32, m2, ta, ma vid.vv24 flw ft0, (a2) diff --git a/libavcodec/riscv/ac3dsp_rvb.S b/libavcodec/riscv/ac3dsp_rvb.S index 0ca56466e1..a3c5187cfe 100644 --- a/libavcodec/riscv/ac3dsp_rvb.S +++ b/libavcodec/riscv/ac3dsp_rvb.S @@ -22,6 +22,7 @@ #include "libavutil/riscv/asm.S" func ff_ac3_exponent_min_rvb, zbb +lpad0 beqza1, 3f 1: addia2, a2, -1 @@ -43,6 +44,7 @@ func ff_ac3_exponent_min_rvb, zbb endfunc func ff_extract_exponents_rvb, zbb +lpad0 1: lw t0, (a1) addi a0, a0, 1 diff --git a/libavcodec/riscv/ac3dsp_rvv.S b/libavcodec/riscv/ac3dsp_rvv.S index 1b5f67a9ec..0ca1332bf1 100644 --- a/libavcodec/riscv/ac3dsp_rvv.S +++ b/libavcodec/riscv/ac3dsp_rvv.S @@ -22,6
[FFmpeg-devel] [PATCH 4/6] lavfi/riscv: add forward-edge CFI landing pads
--- libavfilter/riscv/af_afir_rvv.S | 1 + 1 file changed, 1 insertion(+) diff --git a/libavfilter/riscv/af_afir_rvv.S b/libavfilter/riscv/af_afir_rvv.S index 04ec2e50d8..2d2b8b1ed3 100644 --- a/libavfilter/riscv/af_afir_rvv.S +++ b/libavfilter/riscv/af_afir_rvv.S @@ -22,6 +22,7 @@ // void ff_fcmul_add(float *sum, const float *t, const float *c, int len) func ff_fcmul_add_rvv, zve64f +lpad0 li t1, 32 1: vsetvli t0, a3, e32, m4, ta, ma -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 5/6] sws/riscv: add forward-edge CFI landing pads
--- libswscale/riscv/input_rvv.S | 12 libswscale/riscv/range_rvv.S | 4 libswscale/riscv/rgb2rgb_rvb.S | 1 + libswscale/riscv/rgb2rgb_rvv.S | 7 +++ 4 files changed, 24 insertions(+) diff --git a/libswscale/riscv/input_rvv.S b/libswscale/riscv/input_rvv.S index 1d7de59c66..0562e55921 100644 --- a/libswscale/riscv/input_rvv.S +++ b/libswscale/riscv/input_rvv.S @@ -21,12 +21,14 @@ #include "libavutil/riscv/asm.S" func ff_bgr24ToY_rvv, zve32x +lpad0 lw t1, 8(a5) # BY lw t3, 0(a5) # RY j 1f endfunc func ff_rgb24ToY_rvv, zve32x +lpad0 lw t1, 0(a5) # RY lw t3, 8(a5) # BY 1: @@ -55,6 +57,7 @@ func ff_rgb24ToY_rvv, zve32x endfunc func ff_bgr24ToUV_rvv, zve32x +lpad0 lw t1, 20(a6) # BU lw t4, 32(a6) # BV lw t3, 12(a6) # RU @@ -63,6 +66,7 @@ func ff_bgr24ToUV_rvv, zve32x endfunc func ff_rgb24ToUV_rvv, zve32x +lpad0 lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -101,6 +105,7 @@ func ff_rgb24ToUV_rvv, zve32x endfunc func ff_bgr24ToUV_half_rvv, zve32x +lpad0 lw t1, 20(a6) # BU lw t4, 32(a6) # BV lw t3, 12(a6) # RU @@ -109,6 +114,7 @@ func ff_bgr24ToUV_half_rvv, zve32x endfunc func ff_rgb24ToUV_half_rvv, zve32x +lpad0 lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -152,12 +158,14 @@ endfunc .macro rgba_input chr0, chr1, high func ff_\chr1\()ToY_rvv, zve32x +lpad0 lw t1, 8(a5) # BY lw t3, 0(a5) # RY j 1f endfunc func ff_\chr0\()ToY_rvv, zve32x +lpad0 lw t1, 0(a5) # RY lw t3, 8(a5) # BY 1: @@ -192,6 +200,7 @@ func ff_\chr0\()ToY_rvv, zve32x endfunc func ff_\chr1\()ToUV_rvv, zve32x +lpad0 lw t1, 20(a6) # BU lw t4, 32(a6) # BV lw t3, 12(a6) # RU @@ -200,6 +209,7 @@ func ff_\chr1\()ToUV_rvv, zve32x endfunc func ff_\chr0\()ToUV_rvv, zve32x +lpad0 lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU @@ -244,6 +254,7 @@ func ff_\chr0\()ToUV_rvv, zve32x endfunc func ff_\chr1\()ToUV_half_rvv, zve32x +lpad0 lw t1, 20(a6) # BU lw t4, 32(a6) # BV lw t3, 12(a6) # RU @@ -252,6 +263,7 @@ func ff_\chr1\()ToUV_half_rvv, zve32x endfunc func ff_\chr0\()ToUV_half_rvv, zve32x +lpad0 lw t1, 12(a6) # RU lw t4, 24(a6) # RV lw t3, 20(a6) # BU diff --git a/libswscale/riscv/range_rvv.S b/libswscale/riscv/range_rvv.S index 9da80e6199..1d71ef29f6 100644 --- a/libswscale/riscv/range_rvv.S +++ b/libswscale/riscv/range_rvv.S @@ -21,6 +21,7 @@ #include "libavutil/riscv/asm.S" func ff_range_lum_to_jpeg_16_rvv, zve32x +lpad0 li t1, 30189 li t2, 19077 li t3, -39057361 @@ -42,6 +43,7 @@ func ff_range_lum_to_jpeg_16_rvv, zve32x endfunc func ff_range_lum_from_jpeg_16_rvv, zve32x +lpad0 li t1, 14071 li t2, 33561947 1: @@ -61,6 +63,7 @@ func ff_range_lum_from_jpeg_16_rvv, zve32x endfunc func ff_range_chr_to_jpeg_16_rvv, zve32x +lpad0 li t1, 30775 li t2, 4663 li t3, -9289992 @@ -89,6 +92,7 @@ func ff_range_chr_to_jpeg_16_rvv, zve32x endfunc func ff_range_chr_from_jpeg_16_rvv, zve32x +lpad0 li t1, 1799 li t2, 4081085 1: diff --git a/libswscale/riscv/rgb2rgb_rvb.S b/libswscale/riscv/rgb2rgb_rvb.S index af127b32ed..d9e56d77be 100644 --- a/libswscale/riscv/rgb2rgb_rvb.S +++ b/libswscale/riscv/rgb2rgb_rvb.S @@ -24,6 +24,7 @@ #if (__riscv_xlen >= 64) func ff_shuffle_bytes_3210_rvb, zbb +lpad0 srlia2, a2, 2 bswap32_rvb a1, a0, a2 endfunc diff --git a/libswscale/riscv/rgb2rgb_rvv.S b/libswscale/riscv/rgb2rgb_rvv.S index 19f7aaf67d..8ca1ad94b2 100644 --- a/libswscale/riscv/rgb2rgb_rvv.S +++ b/libswscale/riscv/rgb2rgb_rvv.S @@ -21,11 +21,13 @@ #include "libavutil/riscv/asm.S" func ff_shuffle_bytes_0321_rvv, zve32x +lpad0 li t1, 0x00ff00ff j 1f endfunc func ff_shuffle_bytes_2103_rvv, zve32x +lpad0 li t1, ~0x00ff00ff 1: not t2, t1 @@ -49,12 +51,14 @@ func ff_shuffle_bytes_2103_rvv, zve32x endfunc func ff_shuffle_bytes_1230_rvv, zve32x +lpad0 li t1, 24 li t2, 8 j 3f endfunc func ff_shuffle_bytes_3012_rvv, zve32x +lpad0 li t1, 8 li t2, 24 3: @@ -75,6 +79,7 @@ func ff_shuffle_bytes_3012_rvv, zve32x endfunc f
[FFmpeg-devel] [PATCH 6/6] checkasm/riscv: add forward-edge CFI landing pads
--- tests/checkasm/riscv/checkasm.S | 4 1 file changed, 4 insertions(+) diff --git a/tests/checkasm/riscv/checkasm.S b/tests/checkasm/riscv/checkasm.S index 73ca85f344..835cc7d315 100644 --- a/tests/checkasm/riscv/checkasm.S +++ b/tests/checkasm/riscv/checkasm.S @@ -49,6 +49,7 @@ saved_regs: .endr func checkasm_set_function +lpad0 la.tls.ie t0, checked_func add t0, tp, t0 sd a0, (t0) @@ -56,6 +57,7 @@ func checkasm_set_function endfunc func checkasm_get_wrapper, v +lpad0 addisp, sp, -16 sd fp, (sp) sd ra, 8(sp) @@ -74,6 +76,7 @@ func checkasm_get_wrapper, v ret 2: /* <-- Entry point with the Vector extension --> */ +lpad0 /* Clobber the vectors */ vsetvli t0, zero, e32, m8, ta, ma li t0, 0xdeadbeef @@ -90,6 +93,7 @@ func checkasm_get_wrapper, v csrwi vxsat, 1 /* Saturation:encountered */ 3: /* <-- Entry point without the Vector extension --> */ +lpad0 /* Save RA, unallocatable and callee-saved registers */ la.tls.ie t0, saved_regs add t0, tp, t0 -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 7/8] lavu/riscv: align functions to 4 bytes
Currently the start of the byte range for each function is aligned to 4 bytes. But this can lead to situations whence the function is preceded by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual instruction and the function symbol are only aligned on 2 bytes. This forcefully disables compression for the alignment and the symbol, thus ensuring that there is no padding before the function. --- libavutil/riscv/asm.S | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/libavutil/riscv/asm.S b/libavutil/riscv/asm.S index 37fd7d3b03..633c93d5fd 100644 --- a/libavutil/riscv/asm.S +++ b/libavutil/riscv/asm.S @@ -38,7 +38,6 @@ .macro func sym, ext1=, ext2= .text -.align 2 .option push .ifnb \ext1 @@ -51,7 +50,11 @@ .global \sym .hidden \sym .type \sym, %function +.option push +.option norvc +.align 2 \sym: +.option pop .macro endfunc .size \sym, . - \sym -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 8/8] checkasm/riscv: align the landing pads
--- tests/checkasm/riscv/checkasm.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/checkasm/riscv/checkasm.S b/tests/checkasm/riscv/checkasm.S index 835cc7d315..e8bcbb271e 100644 --- a/tests/checkasm/riscv/checkasm.S +++ b/tests/checkasm/riscv/checkasm.S @@ -75,6 +75,8 @@ func checkasm_get_wrapper, v addisp, sp, 16 ret +.option norvc +.align 2 2: /* <-- Entry point with the Vector extension --> */ lpad0 /* Clobber the vectors */ @@ -92,6 +94,7 @@ func checkasm_get_wrapper, v csrwi vxrm, 3 /* Rounding mode: round-to-odd */ csrwi vxsat, 1 /* Saturation:encountered */ +.align 2 3: /* <-- Entry point without the Vector extension --> */ lpad0 /* Save RA, unallocatable and callee-saved registers */ -- 2.45.2 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] web: move 6.0 to olddownloads
On 7/21/2024 10:50 AM, Michael Niedermayer wrote: Only ubuntu 23.10 uses 6.0 according to downstreams and that is EOL in a few days also 23.10 users will probably upgrade to 24.04 LTS so shouldnt affect anyone --- src/download| 36 src/olddownload | 36 2 files changed, 36 insertions(+), 36 deletions(-) LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] web: move 4.1 to olddownloads
On 7/21/2024 10:50 AM, Michael Niedermayer wrote: No distros are listed on downstreams that are not EOL that use 4.1 --- src/download| 37 - src/olddownload | 37 + 2 files changed, 37 insertions(+), 37 deletions(-) LGTM ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/3] lavc/ffv1: move damage handling code to decode_slice()
On Mon, Jul 22, 2024 at 11:43:21AM +0200, Anton Khirnov wrote: > There is no reason to delay it and this is a more natural place for > this code. There is a reason. By doing it later the surrounding pixels are available and one could compute motion vectors from these surroundings and use all kinds of stuff from motion compensation and optical flow and all that. someone, i dont remember exactly who, (maybe you remember?) said something about premature optimization is the root of all evil. Here that actually applies. Moving the code up thwarts write better error concealment. (and frankly we already have most of that EC code it would just need a cleanup to free it from the 16x16 limitation) thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The real ebay dictionary, page 1 "Used only once"- "Some unspecified defect prevented a second use" "In good condition" - "Can be repaird by experienced expert" "As is" - "You wouldnt want it even if you were payed for it, if you knew ..." signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/2] web: move 4.1 to olddownloads
On Mon, Jul 22, 2024 at 06:13:43PM -0300, James Almer wrote: > On 7/21/2024 10:50 AM, Michael Niedermayer wrote: > > No distros are listed on downstreams that are not EOL that use 4.1 > > --- > > src/download| 37 - > > src/olddownload | 37 + > > 2 files changed, 37 insertions(+), 37 deletions(-) > > LGTM will apply patchset thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB What does censorship reveal? It reveals fear. -- Julian Assange signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avformat/avio: avio_tell() only errors if the context is NULL
On Wed, Jul 17, 2024 at 08:47:45AM +0200, Anton Khirnov wrote: > Quoting Michael Niedermayer (2024-07-11 11:49:37) > > Found by code review related to coverity > > > > Sponsored-by: Sovereign Tech Fund > > Signed-off-by: Michael Niedermayer > > --- > > libavformat/avio.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/libavformat/avio.h b/libavformat/avio.h > > index ebf611187dc..3be91e4b8a7 100644 > > --- a/libavformat/avio.h > > +++ b/libavformat/avio.h > > @@ -489,7 +489,7 @@ int64_t avio_skip(AVIOContext *s, int64_t offset); > > > > /** > > * ftell() equivalent for AVIOContext. > > - * @return position or AVERROR. > > + * @return position or AVERROR in case s is NULL. > > It seems weird to document an invalid call. its mainly a reminder that this doesnt return AVERROR arbitrarily and thus doesnt need to be checked thx [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB When you are offended at any man's fault, turn to yourself and study your own failings. Then you will forget your anger. -- Epictetus signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/h264_mp4toannexb: Prepend SPS/PPS to buffering period SEI
On Tue, 9 Jul 2024 at 12:06, Josh Allmann wrote: > > Encoders may emit a buffering period SEI without a corresponding > SPS/PPS if the SPS/PPS is carried out-of-band, eg with avcc. > > During Annex B conversion, this may result in the SPS/PPS being > inserted *after* the buffering period SEI but before the IDR NAL. > > Since the buffering period SEI references the SPS, the SPS/PPS > needs to come first. > --- > > Notes: > v2: Updated FATE test refs > > libavcodec/bsf/h264_mp4toannexb.c | 13 + > tests/ref/fate/h264-bsf-mp4toannexb| 2 +- > tests/ref/fate/h264_mp4toannexb_ticket2991 | 18 +- > tests/ref/fate/segment-mp4-to-ts | 12 ++-- > 4 files changed, 29 insertions(+), 16 deletions(-) > Ping again for review. Looking at the FATE output, this patch fixes a number of things - see [1] for details [1] https://ffmpeg.org//pipermail/ffmpeg-devel/2024-July/330912.html ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/h264_mp4toannexb: Prepend SPS/PPS to buffering period SEI
On 23/07/2024 01:01, Josh Allmann wrote: On Tue, 9 Jul 2024 at 12:06, Josh Allmann wrote: Encoders may emit a buffering period SEI without a corresponding SPS/PPS if the SPS/PPS is carried out-of-band, eg with avcc. During Annex B conversion, this may result in the SPS/PPS being inserted *after* the buffering period SEI but before the IDR NAL. Since the buffering period SEI references the SPS, the SPS/PPS needs to come first. --- Notes: v2: Updated FATE test refs libavcodec/bsf/h264_mp4toannexb.c | 13 + tests/ref/fate/h264-bsf-mp4toannexb| 2 +- tests/ref/fate/h264_mp4toannexb_ticket2991 | 18 +- tests/ref/fate/segment-mp4-to-ts | 12 ++-- 4 files changed, 29 insertions(+), 16 deletions(-) Ping again for review. Looking at the FATE output, this patch fixes a number of things - see [1] for details patch generally looks good to me, but I'm not closely familiar with the code there. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2] libavformat/vapoursynth: Update to API version 4, load library at runtime
Am 22.07.24 um 08:57 schrieb Anton Khirnov: Quoting Stefan Oltmanns (2024-07-18 14:12:42) Hello Anton, can you eloborate on that? What is unacceptable with my patch that is perfectly fine in the AviSynth input module? It's the very same concept. It's not perfectly fine in avisynth, I dislike it there as well. There are also recent patches that remove the atexit handler. The atexit will be removed in the next revision of the patch. Loading the library at runtime makes it so much more useful, because you can distribute ffmpeg binaries without forcing the user to install VapourSynth (which requires the user to install Python). Runtime loading hides dependencies from standard tools and makes program behaviour harder to analyze. Not to mention you're adding a bunch of global state, which is evil. All global states will be removed in the next revision of the patch. It's the intention of my patch to reduce ("hide") the VapourSynth dependency, so unless you want to actually open a VapourSynth script there is no dependency to it. If you try to open a VapourScript script without having VapourSynth installed, you'll get an error message. VapourSynth itself has unclear dependencies, it will load plug-ins on runtime and as it uses Python you can in fact load other Python libaries, for example AI stuff like PyTorch for fancy upscaling, that will then load CUDA/ROCM. VapourSynth is not just a library like x264 that you can link in statically if you like, VapourSynth is a frame server (like AviSynth) with it's own dependencies. If you worry about platforms that do not support loading libraries at runtime: VapourSynth is based on plugins that are loaded on runtime, so it won't work on those platforms anyway. I am worried about special "demuxers" than are not really demuxers and don't work like other demuxers, hence massively increasing library maintenance load. I somewhat agree here, when I first saw a AviSynth demuxer in the list of supported formats it looked weird, because it's not a format that you demux. But what's the solution? Create a new library like "avframeserver" for 2 (?) different tools? How do they increase the maintinace load? There are a lot of external libraries that get used by ffmpeg, what's the difference here? As these formats do not contain any advanced video codec, but raw video, shouldn't it be rather easy to maintain, because no weird complications with some decoder? Best regards Stefan ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
From: Fei Wang Fix cmd: ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \ -filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv \ -i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv Signed-off-by: Fei Wang --- libavutil/hwcontext_qsv.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/libavutil/hwcontext_qsv.c b/libavutil/hwcontext_qsv.c index 7cec347478..09156275ec 100644 --- a/libavutil/hwcontext_qsv.c +++ b/libavutil/hwcontext_qsv.c @@ -1549,8 +1549,11 @@ static int qsv_frames_derive_from(AVHWFramesContext *dst_ctx, dst_hwctx->texture_infos[i].texture = (ID3D11Texture2D*)pair->first; dst_hwctx->texture_infos[i].index = pair->second == (mfxMemId)MFX_INFINITE ? (intptr_t)0 : (intptr_t)pair->second; } -ID3D11Texture2D_GetDesc(dst_hwctx->texture_infos[0].texture, &texDesc); -dst_hwctx->BindFlags = texDesc.BindFlags; +if (src_hwctx->nb_surfaces) { +ID3D11Texture2D_GetDesc(dst_hwctx->texture_infos[0].texture, &texDesc); +dst_hwctx->BindFlags = texDesc.BindFlags; +} else +dst_hwctx->BindFlags = qsv_get_d3d11va_bind_flags(src_hwctx->frame_type); } break; #endif -- 2.34.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/3] lavc/ffv1: move damage handling code to decode_slice()
Quoting Michael Niedermayer (2024-07-22 23:14:04) > On Mon, Jul 22, 2024 at 11:43:21AM +0200, Anton Khirnov wrote: > > There is no reason to delay it and this is a more natural place for > > this code. > > There is a reason. > By doing it later the surrounding pixels are available and one could > compute motion vectors from these surroundings and use all kinds of stuff > from motion compensation and optical flow and all that. > > someone, i dont remember exactly who, (maybe you remember?) > said something about premature optimization is the root of all evil. > Here that actually applies. Moving the code up thwarts write better > error concealment. (and frankly we already have most of that EC code > it would just need a cleanup to free it from the 16x16 limitation) This is not an optimization though. But okay, I am dropping both original patch 28 and these 3 new patches. Are you ok with the rest of the series? -- Anton Khirnov ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".