Re: [FFmpeg-devel] [PATCH v2] checkasm: add sample argument to adjust during bench

2024-05-21 Thread Henrik Gramner via ffmpeg-devel
On Tue, May 21, 2024 at 2:33 PM J. Dekker wrote: > @@ -338,8 +338,9 @@ typedef struct CheckasmPerf { > uint64_t tsum = 0;\ > int ti, tcount = 0;\ > uint64_t t = 0; \ > +const uint64_t truns = bench_runs;\ > checkasm_set_signal_handler

Re: [FFmpeg-devel] [PATCH 1/4] avutil/x86/pixelutils: Empty MMX state in ff_pixelutils_sad_8x8_mmxext

2023-11-01 Thread Henrik Gramner via ffmpeg-devel
On Wed, Nov 1, 2023 at 10:44 AM Andreas Rheinhardt wrote: > libavutil/x86/pixelutils.asm | 1 + > 1 file changed, 1 insertion(+) IIRC the emms instructions is quite slow on many systems, so if this is the only pixelutils function using mmx it's probably better to just rewrite it to use SSE2 inst

Re: [FFmpeg-devel] [PATCH 2/3] x86/ac3dsp: add ff_float_to_fixed24_avx2()

2023-11-23 Thread Henrik Gramner via ffmpeg-devel
On Thu, Nov 23, 2023 at 12:51 PM James Almer wrote: > movdqa wiht ymm is avx2. I could change it to movaps, but technically > the registers contain floats and i don't know if any old AVX cpu has > penalties for changing domains. Fwiw I believe what domain the result of fp <-> int conversion instr

Re: [FFmpeg-devel] [PATCH v2] avcodec/amfenc: increase precision of Sleep() on Windows

2023-11-27 Thread Henrik Gramner via ffmpeg-devel
On Mon, Nov 27, 2023 at 2:42 PM Mark Thompson wrote: > Is it reasonable to set this global state from a library without the parent > program knowing? We'd really prefer not to affect the global state > unexpectedly. CreateWaitableTimerExW() with the CREATE_WAITABLE_TIMER_HIGH_RESOLUTION flag m

Re: [FFmpeg-devel] [PATCH] avformat/mov_chan: Use anonymous union

2024-03-25 Thread Henrik Gramner via ffmpeg-devel
On Mon, Mar 25, 2024 at 4:01 PM Andreas Rheinhardt wrote: > > Right, it is an anonymous enum, not union. Amended locally. > > - Andreas Can confirm this eliminates the warnings, lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.

Re: [FFmpeg-devel] [PATCH] avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64

2024-03-25 Thread Henrik Gramner via ffmpeg-devel
On Sun, Mar 24, 2024 at 8:21 PM Henrik Gramner wrote: > > Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened > to work before due to x86inc.asm previously performing XMM spills in > INIT_MMX mode which was more of a bug than an intentional feature. Will apply. __

Re: [FFmpeg-devel] [GASPP PATCH] Implicitly start out in the text section for armasm

2024-04-04 Thread Henrik Gramner via ffmpeg-devel
On Wed, Apr 3, 2024 at 3:47 PM Martin Storsjö wrote: > > This fixes assembling files starting with bare symbol declarations, > without explicitly switching to .text first. lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/ma

Re: [FFmpeg-devel] [PATCH] lavf/vsrc_ddagrab: WinAPI functions must be called as stdcall in x86_32

2024-04-07 Thread Henrik Gramner via ffmpeg-devel
On Sun, Apr 7, 2024 at 2:59 AM Vadim Guchenko wrote: > +typedef DPI_AWARENESS_CONTEXT (__stdcall > *set_thread_dpi_t)(DPI_AWARENESS_CONTEXT); I believe most existing code uses WINAPI instead of __stdcall. ___ ffmpeg-devel mailing list ffmpeg-devel@

Re: [FFmpeg-devel] [PATCH v3 2/5] ffbuild/libversion.sh: add shebang

2024-04-09 Thread Henrik Gramner via ffmpeg-devel
On Tue, Apr 9, 2024 at 11:52 PM Marth64 wrote: > > +#!/bin/sh > Might I suggest `#!/usr/bin/env sh` instead for this case? > I tend to prefer it from a portability and usability perspective, > but I can imagine for sh it might not matter. /bin/sh exists on virtually every *NIX system whereas /usr

Re: [FFmpeg-devel] [PATCH] avcodec/x86/hevc: fix luma 12b overflow

2024-02-25 Thread Henrik Gramner via ffmpeg-devel
On Sun, Feb 25, 2024 at 5:42 PM Ronald S. Bultje wrote: > +movam13, [pw_8] > +paddw m10, m12, m12 > +paddw m12, m10 ; 9 * (q0 - p0) - 3 * ( q1 - p1 ) > paddw m12, m13; + 8 Memory operand > +paddw m10, m13, m13 > +paddw

Re: [FFmpeg-devel] [PATCH] libavcodec/h264pred: Remove pred8x8_horizontal_8_mmxext

2024-03-02 Thread Henrik Gramner via ffmpeg-devel
On Sat, Mar 2, 2024 at 10:13 PM Kieran Kunhya wrote: > SPLATB_LOAD m0, r0+r1*0-1, m2 > SPLATB_LOAD m1, r0+r1*1-1, m2 This adds an extra unnecessary shuffle in the SSE2 code as it splats to a full register. The easiest way of fixing it would probably be to unroll the macro and manually g

[FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-16 Thread Henrik Gramner via ffmpeg-devel
Makes things up-to-date with the upstream at https://code.videolan.org/videolan/x86inc.asm Specifying every individual change is difficult as there have been divergences and cherry-picks over time, but the full upstream change log can be found at https://code.videolan.org/videolan/x86inc.asm/-/com

[FFmpeg-devel] [PATCH] avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation

2024-03-17 Thread Henrik Gramner via ffmpeg-devel
Fixes yadif-16 which allows FATE to pass. Broken since 2904db90458a1253e4aea6844ba9a59ac11923b6 (2017). pminsd_emulation.patch Description: Binary data ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-dev

Re: [FFmpeg-devel] [PATCH] avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation

2024-03-17 Thread Henrik Gramner via ffmpeg-devel
On Sun, Mar 17, 2024 at 1:44 PM James Almer wrote: > LGTM. I wonder why we even added a float based fallback for this. Thanks, pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe

Re: [FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-19 Thread Henrik Gramner via ffmpeg-devel
On Sat, Mar 16, 2024 at 8:53 PM Henrik Gramner wrote: > Makes things up-to-date with the upstream at > https://code.videolan.org/videolan/x86inc.asm Will push in a few days if there are no comments. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [PATCH] x86: Update x86inc.asm

2024-03-24 Thread Henrik Gramner via ffmpeg-devel
On Tue, Mar 19, 2024 at 11:20 AM Henrik Gramner wrote: > > Will push in a few days if there are no comments. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above

[FFmpeg-devel] [PATCH] avcodec/x86/h264_idct: Fix incorrect xmm spilling on win64

2024-03-24 Thread Henrik Gramner via ffmpeg-devel
Broken in afa471d0efed1df5dca6eeeb2fcdd211ae4cad4e. It just happened to work before due to x86inc.asm previously performing XMM spills in INIT_MMX mode which was more of a bug than an intentional feature. x86_h264_idct_spill_xmm.patch Description: Binary data _

Re: [FFmpeg-devel] [PATCH] x86inc: Add REPX macro to repeat instructions/operations

2023-10-01 Thread Henrik Gramner via ffmpeg-devel
On Fri, Sep 29, 2023 at 1:38 PM Frank Plowman wrote: > libavutil/x86/x86inc.asm | 10 ++ > 1 file changed, 10 insertions(+) LGTM. As a side note https://code.videolan.org/videolan/x86inc.asm is the upstream repo for x86inc.asm. ___ ffmpeg-deve

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Henrik Gramner via ffmpeg-devel
On Tue, Dec 19, 2023 at 1:02 PM Martin Storsjö wrote: > This replaces the riscv specific handling from > 7212466e735aa187d82f51dadbce957fe3da77f0 (which essentially is > reverted, together with 286d6742218ba0235c32876b50bf593cb1986353) > with a different implementation of the same (plus a bit more

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-21 Thread Henrik Gramner via ffmpeg-devel
On Thu, Dec 21, 2023 at 9:16 PM Rémi Denis-Courmont wrote: > > +checkasm_fail_func("%s", > > + s == SIGFPE ? "fatal arithmetic error" : > > + s == SIGILL ? "illegal instruction" : > > + s == SIGBUS ? "bus error"

Re: [FFmpeg-devel] [PATCH] checkasm: Generalize crash handling

2023-12-22 Thread Henrik Gramner via ffmpeg-devel
On Fri, Dec 22, 2023 at 7:20 AM Rémi Denis-Courmont wrote: > >> > +checkasm_fail_func("%s", > >> > + s == SIGFPE ? "fatal arithmetic error" : > >> > + s == SIGILL ? "illegal instruction" : > >> > + s == SIGBUS ?

[FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms

2025-05-16 Thread Henrik Gramner via ffmpeg-devel
Placed in a new separate file as the existing combined MMX/SSE/AVX file is humongous and takes forever to assemble as is. This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which the corresponding AVX2 functions makes up ~42 KiB. Tested to pass FATE on Linux and Windows. Checkasm n

Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms

2025-05-19 Thread Henrik Gramner via ffmpeg-devel
On Sat, May 17, 2025 at 12:59 AM Henrik Gramner wrote: > > Placed in a new separate file as the existing combined MMX/SSE/AVX > file is humongous and takes forever to assemble as is. > > This adds ~16 KiB of .text. The existing 8bpc asm is ~240 KiB of which > the corresponding AVX2 functions makes

Re: [FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms

2025-05-26 Thread Henrik Gramner via ffmpeg-devel
On Wed, May 21, 2025 at 5:48 PM Henrik Gramner wrote: > > Tested to pass FATE on Linux and Windows. Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or emai

[FFmpeg-devel] [PATCH] avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 10bpc inverse transforms

2025-05-21 Thread Henrik Gramner via ffmpeg-devel
Tested to pass FATE on Linux and Windows. Checkasm numbers vs the existing SSE2 code on Zen 5 (Strix Halo): vp9_inv_adst_adst_16x16_sub16_add_10_sse2: 1041.8 ( 1.92x) vp9_inv_adst_adst_16x16_sub16_add_10_avx512icl: 132.5 (15.06x) vp9_inv_dct_adst_16x16_sub16_add_10_sse2: 901.0 ( 1