Re: [FFmpeg-devel] [PATCH 3/3] avfilter: add avx2 filter_line function for bwdif

2023-03-13 Thread James Darnley
On 3/11/23 17:14, Thomas Mundt wrote: +%if mmsize == 32 +vpbroadcastd m12, DWORD clip_maxm I get a green pattern at bit depths > 8. Looks good with: vpbroadcastw m12, WORD clip_maxm +%else movdm12, DWORD clip_maxm SPLATW m12, m12, 0 +%endif Of

Re: [FFmpeg-devel] [PATCH 2/3] checkasm: add test for bwdif

2023-03-13 Thread James Darnley
On 3/11/23 17:18, Thomas Mundt wrote: I'm not familiar with checkasm tests, but isn't this one limited to a bit depth of 8? Yes, that was the idea because I was only intending to modify the 8-bit function, for now. The function pointer is the same for all depths so you need to initializ

Re: [FFmpeg-devel] [PATCH] tests: actually test yadif's 10 and 16-bit functions

2023-03-05 Thread James Darnley
On 2/20/23 14:06, James Darnley wrote: On 2/20/23 13:49, Nicolas George wrote: James Darnley (12023-02-20): snip Moving scale before yadif is right, but format= is redundant with -pix_fmt. Regards, So the patch should just be moving the scale filter first?  Sure.  Any other comments

Re: [FFmpeg-devel] libavfilter/x86/vf_convolution.asm- fix missing decelerator for AVX512ICL sobel

2023-02-24 Thread James Darnley
On 2/24/23 04:00, Felix LeClair wrote: Fixes: Compilation of Sobel with AVX512ICL Caused: Comment left without deleniator in AVX512ICL version of SOBEL Testing:Confirmed working on AVX512 Alderlake (AKA SPR without AMX) diff --git a/libavfilter/x86/vf_convolution.asm b/libavfilter/x86/vf_co

[FFmpeg-devel] [PATCH 3/3] avfilter: add avx2 filter_line function for bwdif

2023-02-20 Thread James Darnley
2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3 --- libavfilter/x86/vf_bwdif.asm| 29 - libavfilter/x86/vf_bwdif_init.c | 12 2 files changed, 36 insertions(+), 5 deletions(-) diff --git a/libavfilter/x86/vf_bwdif.asm b/libavfilter/x

[FFmpeg-devel] [PATCH 2/3] checkasm: add test for bwdif

2023-02-20 Thread James Darnley
--- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/vf_bwdif.c | 70 +++ tests/fate/checkasm.mak | 1 + 5 files changed, 76 insertions(+) create mode 100644 tests/checkasm/vf_bwdif.c diff

[FFmpeg-devel] [PATCH 1/3] avfilter: move bwdif's filter_line init into a dedicated function

2023-02-20 Thread James Darnley
--- libavfilter/bwdif.h | 3 ++- libavfilter/vf_bwdif.c | 13 + libavfilter/x86/vf_bwdif_init.c | 4 +--- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index 889ff772ed..5749345f78 100644 --- a/libavfilt

Re: [FFmpeg-devel] [PATCH] tests: actually test yadif's 10 and 16-bit functions

2023-02-20 Thread James Darnley
On 2/20/23 13:49, Nicolas George wrote: James Darnley (12023-02-20): -fate-filter-yadif10: CMD = framecrc -flags bitexact -idct simple -i $(TARGET_SAMPLES)/mpeg2/mpeg2_field_encoding.ts -flags bitexact -pix_fmt yuv420p10le -frames:v 30 -vf yadif=0,scale -fate-filter-yadif16: CMD = framecrc

Re: [FFmpeg-devel] [PATCH 3/3] avfilter/yadif: add avx2 filter_line function

2023-02-20 Thread James Darnley
On 2/10/23 14:06, James Darnley wrote: snip This patch set is broken. The checkasm test is incomplete. This avx2 function has some bug that only manifests when the strides (prefs mrefs) are opposite signs (one positive and one negative). That situation is what happens with real usage. I

[FFmpeg-devel] [PATCH] tests: actually test yadif's 10 and 16-bit functions

2023-02-20 Thread James Darnley
--- tests/fate/filter-video.mak | 4 +-- tests/ref/fate/filter-yadif10 | 60 +-- tests/ref/fate/filter-yadif16 | 60 +-- 3 files changed, 62 insertions(+), 62 deletions(-) diff --git a/tests/fate/filter-video.mak b/tests/fate/filt

[FFmpeg-devel] [PATCH 1/3] avfilter: move yadif's filter_line init into a dedicated function

2023-02-10 Thread James Darnley
--- libavfilter/vf_yadif.c | 13 + libavfilter/x86/vf_yadif_init.c | 4 +--- libavfilter/yadif.h | 3 ++- 3 files changed, 12 insertions(+), 8 deletions(-) diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c index afa4d1d53d..1f9434f961 100644 --- a/lib

[FFmpeg-devel] [PATCH 3/3] avfilter/yadif: add avx2 filter_line function

2023-02-10 Thread James Darnley
Zen 2 (Ryzen 7 3700X): 1.73x faster (3603±586.3 vs. 2082±317.1 decicycles) compared with ssse3 Using an SD y4m file speed increases from ~ 3600 fps to ~4700. --- libavfilter/x86/vf_yadif.asm| 83 +++-- libavfilter/x86/vf_yadif_init.c | 4 ++ 2 files changed, 62 in

[FFmpeg-devel] [PATCH 2/3] checkasm: add test for yadif

2023-02-10 Thread James Darnley
--- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 3 ++ tests/checkasm/checkasm.h | 1 + tests/checkasm/vf_yadif.c | 62 +++ 4 files changed, 67 insertions(+) create mode 100644 tests/checkasm/vf_yadif.c diff --git a/tests/checkasm/Makefile b

[FFmpeg-devel] [RFC PATCH 2/2] avcodec/x86: add avx512icl function for v210dec

2022-12-15 Thread James Darnley
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles) compared with avx2 --- I think I can merge this with the existing macro without it being too ugly. That might allow a plain avx512 version too but I can't say if that would be any faster. libavcodec/x86/v210-init.c |

[FFmpeg-devel] [PATCH 1/2] avcodec/x86/v210: add some comments to the improved avx2 function

2022-12-15 Thread James Darnley
--- libavcodec/x86/v210.asm | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/v210.asm b/libavcodec/x86/v210.asm index 3b9e0761df..600a4ddc5f 100644 --- a/libavcodec/x86/v210.asm +++ b/libavcodec/x86/v210.asm @@ -65,18 +65,18 @@ cglobal v210_planar_unp

Re: [FFmpeg-devel] [PATCH] configure: support lsan as toolchain

2022-12-15 Thread James Darnley
On 12/7/22 17:08, James Darnley wrote: --- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index f4eedfc207..eaa5ef6b20 100755 --- a/configure +++ b/configure @@ -4315,6 +4315,11 @@ case "$toolchain" in add_cflags -fsaniti

[FFmpeg-devel] [PATCH] configure: support lsan as toolchain

2022-12-07 Thread James Darnley
--- configure | 5 + 1 file changed, 5 insertions(+) diff --git a/configure b/configure index f4eedfc207..eaa5ef6b20 100755 --- a/configure +++ b/configure @@ -4315,6 +4315,11 @@ case "$toolchain" in add_cflags -fsanitize=address add_ldflags -fsanitize=address ;; +

[FFmpeg-devel] [PATCH v2 5/5] avcodec/x86/v210enc: remove unneeded instruction

2022-11-25 Thread James Darnley
--- libavcodec/x86/v210enc.asm | 1 - 1 file changed, 1 deletion(-) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index d3639cd440..daf5f2ab81 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -331,7 +331,6 @@ cglobal v210_planar_pack_8, 5, 5, 7+no

[FFmpeg-devel] [PATCH v2 4/5] avcodec/x86/v210enc: expand and correct comments

2022-11-25 Thread James Darnley
--- libavcodec/x86/v210enc.asm | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index 552164a8be..d3639cd440 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -314,7 +314,7 @@ cglobal v210_

[FFmpeg-devel] [PATCH v2 3/5] avcodec/v210enc: add new 10-bit function for avx512 avx512icl

2022-11-25 Thread James Darnley
avx512 on Skylake-X (Xeon D-2123IT): 1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2 avx512icl on Ice Lake (Xeon Silver 4316): 2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2 --- libavcodec/x86/v210enc.asm| 99 +++ libavcod

[FFmpeg-devel] [PATCH v2 2/5] avcodec/x86/v210enc: replace register use with named register

2022-11-25 Thread James Darnley
--- libavcodec/x86/v210enc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index afac238ede..c2ad3d72c0 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -62,7 +62,7 @@ SECTION .text ; v210

[FFmpeg-devel] [PATCH v2 1/5] checkasm/v210enc: test the entire width of 10-bit planar input arrays

2022-11-25 Thread James Darnley
--- tests/checkasm/v210enc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c index 9942e08137..9fb8321c25 100644 --- a/tests/checkasm/v210enc.c +++ b/tests/checkasm/v210enc.c @@ -72,8 +72,10 @@ randomize_buf

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/v210enc: add new 10-bit function for avx512 avx512icl

2022-11-21 Thread James Darnley
ARCH_X86_64 is always defined. So checks of this type need to check with #if. Thanks. I forgot the ffmpeg convention there. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit l

[FFmpeg-devel] [PATCH 3/3] avcodec/v210enc: add new 10-bit function for avx512 avx512icl

2022-11-21 Thread James Darnley
avx512 on Skylake-X (Xeon D-2123IT): 1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2 avx512icl on Ice Lake (Xeon Silver 4316): 2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2 --- libavcodec/x86/v210enc.asm| 99 +++ libavcod

[FFmpeg-devel] [PATCH 2/3] avcodec/x86/v210: replace register use with named register

2022-11-21 Thread James Darnley
--- libavcodec/x86/v210enc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm index afac238ede..c2ad3d72c0 100644 --- a/libavcodec/x86/v210enc.asm +++ b/libavcodec/x86/v210enc.asm @@ -62,7 +62,7 @@ SECTION .text ; v210

[FFmpeg-devel] [PATCH 1/3] checkasm/v210enc: test the entire width of 10-bit planar input arrays

2022-11-21 Thread James Darnley
--- tests/checkasm/v210enc.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c index 9942e08137..9fb8321c25 100644 --- a/tests/checkasm/v210enc.c +++ b/tests/checkasm/v210enc.c @@ -72,8 +72,10 @@ randomize_buf

Re: [FFmpeg-devel] [PATCH] avcodec/v210enc: add new function for avx2 avx512 avx512icl

2022-10-31 Thread James Darnley
+%else +pand m1, m6, m1 +pandn m0, m6, m0 +porm0, m0, m1 +%endif Isn't that pattern a vpblendb or some such ? I think Kieran already responded to this on IRC but I will too. Unfortunately not. This blend is at the bit lev

[FFmpeg-devel] [PATCH] avcodec/v210enc: add new function for avx2 avx512 avx512icl

2022-10-28 Thread James Darnley
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and Broadwell (Xeon E5-2620 v4): 1690±4.3 decicycles vs. 1693±78.4 1439±31.1 decicycles vs 1429±16.7 Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT): 1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2 Bett

[FFmpeg-devel] [PATCH] checkasm: add a verbose check function for uint32_t data

2022-10-28 Thread James Darnley
--- tests/checkasm/checkasm.c | 1 + tests/checkasm/checkasm.h | 1 + 2 files changed, 2 insertions(+) diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index 421bd096c5..c3d77cb6af 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -918,5 +918,6 @@ int che

[FFmpeg-devel] [PATCH] avutil/tests/cpu: print the avx512icl flag

2022-10-28 Thread James Darnley
--- libavutil/tests/cpu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c index 5bec742b2b..dadadb31dc 100644 --- a/libavutil/tests/cpu.c +++ b/libavutil/tests/cpu.c @@ -77,6 +77,7 @@ static const struct { { AV_CPU_FLAG_BMI2, "bmi2"

[FFmpeg-devel] [PATCH] mailmap: stop git lying about who I commit things as

2022-10-28 Thread James Darnley
--- .mailmap | 1 - 1 file changed, 1 deletion(-) diff --git a/.mailmap b/.mailmap index ba072f38c8..af60290f77 100644 --- a/.mailmap +++ b/.mailmap @@ -1,4 +1,3 @@ - -- 2.38.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ff

Re: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512

2022-10-26 Thread James Darnley
I guess it could also be scaled to ymm if you're a big Skylake fan :P (in which case you'd probably want to reorder the shuffle indices so

[FFmpeg-devel] Discrepancy between comments for AVX512 flags

2022-08-26 Thread James Darnley
While cherry-picking some stuff for avx512 I have noticed that ffmpeg has a discrepancy in the comments for the two avx512 flags. Lets start with the public header libavutil/cpu.h 56│ #define AV_CPU_FLAG_AVX512 0x10 ///< AVX-512 functions: requires OS support even if YMM/ZMM register

[FFmpeg-devel] [PATCH] avfilter/vf_subtitles: add an option to choose sub stream by language

2022-04-18 Thread James Darnley
--- doc/filters.texi | 5 + libavfilter/vf_subtitles.c | 23 --- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/doc/filters.texi b/doc/filters.texi index a161754233..cfbc807f16 100644 --- a/doc/filters.texi +++ b/doc/filters.texi @@ -21160,6 +211

Re: [FFmpeg-devel] [PATCH 1/3] avcodec/bitpacked: ,

2020-06-03 Thread James Darnley
On 2020-06-04 01:19, Michael Niedermayer wrote: > Fixes: array end overread > Fixes: > 22395/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BITPACKED_fuzzer-5760940300828672 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-

Re: [FFmpeg-devel] [FFmpeg-cvslog] pthread_frame: merge the functionality for normal decoder init and init_thread_copy

2020-06-03 Thread James Darnley
On 2020-04-10 16:53, Anton Khirnov wrote: > ffmpeg | branch: master | Anton Khirnov | Mon Jan 9 > 18:04:42 2017 +0100| [1f4cf92cfbd3accbae582ac63126ed5570ddfd37] | committer: > Anton Khirnov > > pthread_frame: merge the functionality for normal decoder init and > init_thread_copy > > The cur

Re: [FFmpeg-devel] [PATCH] swscale/x86/yuv2rgb: Fix build without SSSE3

2020-02-23 Thread James Darnley
On 2020-02-23 18:58, Michael Niedermayer wrote: > On Sun, Feb 23, 2020 at 05:03:36PM +0100, Carl Eugen Hoyos wrote: >> Am So., 23. Feb. 2020 um 13:30 Uhr schrieb Michael Niedermayer >> : >>> >>> From: Parker Ernest <@> >>> >>> commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on >>> x86_

Re: [FFmpeg-devel] [PATCH] Add .mailmap

2020-02-23 Thread James Darnley
On 2020-02-23 15:12, Jean-Baptiste Kempf wrote: > Yo, > > On Sat, Feb 22, 2020, at 22:18, Josh de Kock wrote: >> This allows for easy shortlog/log parsing, useful in determining >> eligible members of the general assembly for the new FFmpeg voting >> system. > > I think this is a good idea. > But

Re: [FFmpeg-devel] [PATCH] swscale/x86/yuv2rgb: Fix build without SSSE3

2020-02-23 Thread James Darnley
On 2020-02-23 13:22, Michael Niedermayer wrote: > From: Parker Ernest <@> > > commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on > x86_64 CPUs which do not have SSSE3, e.g. AMD Phenom-II > > Signed-off-by: Michael Niedermayer > --- > libswscale/x86/yuv2rgb.c | 2 ++ > 1 file change

Re: [FFmpeg-devel] Followup: FOSDEM meeting

2020-02-22 Thread James Darnley
On 2020-02-22 13:25, Paul B Mahol wrote: > On 2/22/20, James Darnley wrote: >> On 2020-02-22 11:11, Thilo Borgmann wrote: >>> Please someone put an IRC log from the meeting there, too. James Darnley? >>> Also the audio was streamed, somebody might remember where too ex

Re: [FFmpeg-devel] Followup: FOSDEM meeting

2020-02-22 Thread James Darnley
On 2020-02-22 11:11, Thilo Borgmann wrote: > Please someone put an IRC log from the meeting there, too. James Darnley? > Also the audio was streamed, somebody might remember where too exactly. > Michael? I can post my log from the day, probably email attachment. Should I remove any of

Re: [FFmpeg-devel] What new instructions would you like?

2020-02-01 Thread James Darnley
On 30/12/2019, Lauri Kasanen wrote: > Hi, > > For the Libre RISC-V project, I'm going to research the popular codecs > and design new instructions to help speed them up. With ffmpeg being > home to lots of asm folks for many platforms, I also want to ask your > opinion. > > What new instructions w

Re: [FFmpeg-devel] [IMPORTANT] FOSDEM meeting

2020-02-01 Thread James Darnley
On 28/01/2020, Liu Steven wrote: > > >> 在 2020年1月27日,下午3:29,Jean-Baptiste Kempf 写道: >> It will be joinable through some VideoConf tool. > Can we join by IRC or other things on internet? > Because these days are Spring Festival (Chinese New Year, Important > festivals that have lasted for thousand

Re: [FFmpeg-devel] [PATCH, v3, 1/7] lavu/pixfmt: add new pixel format 0yuv/y210/y410

2019-12-05 Thread James Darnley
On 2019-12-04 15:43, Linjie Fu wrote: > Previously, media driver provided planar format(like 420 8 bit), > but for HEVC Range Extension (422/444 8/10 bit), the decoded image > is produced in packed format because Windows expects it. > > Add some packed pixel formats for hardware decode support in

Re: [FFmpeg-devel] [Contract Request] for FFmpeg libmp3lame multi-threaded feature implementation

2019-11-25 Thread James Darnley
On 2019-11-25 13:52, Chandra Nakka wrote: > Dear FFmpeg developers, > > I'm very happy to have found your details on FFmpeg website for requesting > FFmpeg feature implementation. > > Currently I'm using FFmpeg command line tool on my linux servers to process > media files into instant mp3 audio

Re: [FFmpeg-devel] [PATCH] avutil/eval: add sgn()

2019-10-12 Thread James Darnley
On 2019-10-11 21:45, Paul B Mahol wrote: > diff --git a/doc/utils.texi b/doc/utils.texi > index d55dd315c3..4e2e713505 100644 > --- a/doc/utils.texi > +++ b/doc/utils.texi > @@ -920,6 +920,9 @@ corresponding input value will be returned. > @item round(expr) > Round the value of expression @var{e

[FFmpeg-devel] [PATCH 2/2] avcodec/h264: fix draw_horiz_band with slice threads

2019-09-02 Thread James Darnley
From: Kieran Kunhya --- libavcodec/h264_slice.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c index 5ceee107a0..fe2aa01ceb 100644 --- a/libavcodec/h264_slice.c +++ b/libavcodec/h264_slice.c @@

[FFmpeg-devel] [PATCH 1/2] avcodec/h264: enable draw_horiz_band

2019-09-02 Thread James Darnley
--- libavcodec/h264dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c index 8d1bd16a8e..b9f304936c 100644 --- a/libavcodec/h264dec.c +++ b/libavcodec/h264dec.c @@ -1056,7 +1056,7 @@ AVCodec ff_h264_decoder = { .init

[FFmpeg-devel] [PATCH 0/2] WIP: h264, slice threads, draw_horiz_band

2019-09-02 Thread James Darnley
-frames in chunked mode. Needs more work. James Darnley (1): avcodec/h264: enable draw_horiz_band Kieran Kunhya (1): avcodec/h264: fix draw_horiz_band with slice threads libavcodec/h264_slice.c | 29 +++-- libavcodec/h264dec.c| 2 +- 2 files changed, 24 insert

[FFmpeg-devel] [PATCH 3/7] x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros

2019-08-05 Thread James Darnley
From: Henrik Gramner Use register numbers instead of copying the full register names. This makes it possible to change register widths in the middle of a function and keep the mmreg permutations intact which can be useful for code that only needs larger vectors for parts of the function in combin

[FFmpeg-devel] [PATCH 7/7] x86inc: Add support for GFNI instructions

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index d1b4c982fc..8c8cc97e0c 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc

[FFmpeg-devel] [PATCH 2/7] x86inc: Optimize VEX instruction encoding

2019-08-05 Thread James Darnley
From: Henrik Gramner Most VEX-encoded instructions require an additional byte to encode when src2 is a high register (e.g. x|ymm8..15). If the instruction is commutative we can swap src1 and src2 when doing so reduces the instruction length, e.g. vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8,

[FFmpeg-devel] [PATCH 0/7] Import some x264asm patches from x264

2019-08-05 Thread James Darnley
Here are a few easy-to-import patches from x264. These are all after x264 commit 4a158b00 "x86inc: Correctly set mmreg variables" which FFmpeg already has (commit eb5f063e7c). It does not include the following commits: * 82721eae "x86inc: Add x86-32 PIC support macros" * 101bd27d "x86inc: Support

[FFmpeg-devel] [PATCH 6/7] x86inc: Improve warnings for use of unsupported instructions

2019-08-05 Thread James Darnley
From: Henrik Gramner Warn when the following are used without the appropriate cpuflag: * YMM and ZMM registers * 'pextrw' with a memory operand * GPR instruction set extensions --- libavutil/x86/x86inc.asm | 120 +++ 1 file changed, 83 insertions(+), 37 del

[FFmpeg-devel] [PATCH 5/7] x86inc: Make 'non-adjacent' default in the TAIL_CALL macro

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 04dbb6b785..af35fe1e4d 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -685,7 +685,7 @@ DECLARE_

[FFmpeg-devel] [PATCH 4/7] x86inc: Turn 'movsxd' into 'movifnidn' on x86-32

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 4 1 file changed, 4 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 10b7711637..04dbb6b785 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -293,6 +293,10 @@ DECLARE_REG_TMP_SIZ

[FFmpeg-devel] [PATCH 1/7] x86inc: Fix VEX -> EVEX instruction conversion

2019-08-05 Thread James Darnley
From: Henrik Gramner There's an edge case that wasn't properly handled. --- libavutil/x86/x86inc.asm | 5 + 1 file changed, 5 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 5044ee86f0..bc370a6186 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86

Re: [FFmpeg-devel] Issues while encoding a ts file to m3u8

2019-08-02 Thread James Darnley
On 2019-08-02 15:55, Ramana Jajula wrote: > Hi, > > I am trying to encode my ts file m3u8 using my customised ffmpeg of version > 4.1. I used below command to do encoding. > > ffmpeg -re -threads 8 -i /videos/input.ts -vcodec libx264 -s 320x240 -b:v > 512000 -maxrate 512000 -acodec libfdk_aac -b:

Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-28 Thread James Darnley
On 2019-06-28 03:03, Hendrik Leppkes wrote: > On Fri, Jun 28, 2019 at 1:26 AM James Darnley wrote: >> >> On 2019-06-28 04:26, Linjie Fu wrote: >>> Previously, media driver provided planar format(like 420 8 bit), but >>> for HEVC Range Extension (422/44

Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-27 Thread James Darnley
On 2019-06-28 04:26, Linjie Fu wrote: > Previously, media driver provided planar format(like 420 8 bit), but > for HEVC Range Extension (422/444 8/10 bit), the decoded image is > produced in packed format. > > Y210/AYUV/Y410 are packed formats which are needed in HEVC Rext decoding > for both VAAP

Re: [FFmpeg-devel] [PATCH] avcodec: Add librav1e encoder

2019-05-28 Thread James Darnley
On 2019-05-28 22:00, Derek Buitenhuis wrote: > On 28/05/2019 20:58, James Almer wrote: >> I think x26* and vpx/aom call it crf? It's not in option_tables.h in any >> case. > > They do not. This is a constant quantizer mode, not constant rate factor. IIRC either qp or cqp signature.asc Descrip

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 12:06, James Darnley wrote: > On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: >> From: Limin Wang >> >> ... > > Why? I see why: so you don't screw-up the macros you create later. signature.asc Descri

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: > From: Limin Wang > > ... Why? And these are "comments" not "commands". signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.o

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 12:15, Michael Niedermayer wrote: > On Sat, May 18, 2019 at 12:02:55PM +0200, James Darnley wrote: >> I object to the commit message though because it isn't a "null pointer >> dereference" but if that is the error as reported by the tool then keep >

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 09:39, Michael Niedermayer wrote: > Fixes: "null pointer dereference" > Fixes: > 14551/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_V210_fuzzer-5088609952071680 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-o

Re: [FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-18 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > I am resending this my patches because I am not sure if I sent this version in > the past. I split my changes into two patches because they do separate > things. > > I also changed some tabs to spaces in Mike's AVX2 patch. &

Re: [FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > From: Michael Stoner Screw you mailing list or git, which ever one of you managed to screw up the author's address. I will correct that, if I can. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.o

[FFmpeg-devel] [PATCH 1/3] avcodec/v210dec: move DSP function setting into dedicated function

2019-04-10 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

[FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
From: Michael Stoner Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX --- Mike, is this still the patch you want applied. I had to make a small amendment to it because you had some tabs as indentation. libavcodec/v210dec.c | 10 +- libavcodec/

[FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-10 Thread James Darnley
I am resending this my patches because I am not sure if I sent this version in the past. I split my changes into two patches because they do separate things. I also changed some tabs to spaces in Mike's AVX2 patch. James Darnley (2): avcodec/v210dec: move DSP function setting into dedi

[FFmpeg-devel] [PATCH 2/3] checkasm: add test for v210dec

2019-04-10 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-27 Thread James Darnley
On 2019-03-26 21:22, Mike Stoner via ffmpeg-devel wrote: > Hello, > I’ve accounted for all feedback on this so far, I’m wondering if it is ready > to be pushed upstream? > > Here are my results from ‘checkasm’ (lower is better): > > v210_unpack_c: 1636 > v210_unpack_ssse3: 611 > v210_unpack_avx:

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

[FFmpeg-devel] [PATCH] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..6db662538e 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

Re: [FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
On 2019-03-06 20:31, James Darnley wrote: > ... Wrong patch and wrong reference. Please ignore this. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
On 2019-03-06 10:11, Paul B Mahol wrote: > On 3/6/19, Carl Eugen Hoyos wrote: >> 2019-03-04 23:58 GMT+01:00, James Darnley : >>> Prepare for checkasm test. >>> --- >>> libavcodec/v210dec.c | 13 + >>> libavcodec/v210dec.h | 1 + >&g

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-04 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-04 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 13 + libavcodec/v210dec.h | 1 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..28cf00d320 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210dec.c

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-01 18:41, Michael Stoner wrote: > The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my > first patch submission so any comments are greatly appreciated. > > -Mike > > Tested on Skylake (Win32 & Win64) > 1920x1080 input frame > = > C code - 440

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-03 15:44, Martin Vignali wrote: > Hello, > > ... > > Not directly related to this patch, but it can be interesting for testing > purpose to write a checkasm test for the v210 func decoding. > So it's more easy to check the perf for "each" cpu flags, and be sure, the > various width cas

Re: [FFmpeg-devel] Lossy GIF encoding

2019-02-15 Thread James Darnley
On 2019-02-15 10:01, Kornel wrote: > libavcodec/gif.c in ff_gif_encoder.pix_fmts seems to passively declare types > of pixel formats it accepts. If you want to experiment you can change that so it accepts rgb (also or only). Then you can implement and test what you want, then you can ask about s

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-06 Thread James Darnley
On 2018-09-06 19:39, Sigríður Regína Sigurþórsdóttir wrote: > +if (s->metadata_header_padding) { > +if (s->metadata_header_padding == 1) > +s->metadata_header_padding++; > +put_ebml_void(pb, s->metadata_header_padding); > +} Unfortunately I was forced to make th

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-05 Thread James Darnley
On 2018-09-05 22:52, Sigríður Regína Sigurþórsdóttir wrote: > +{"reserve_free_space", "Reserve a given amount of space at the > beginning og the file for unspecified purpose." I added the "metadata_header_padding" global option many years ago. Can you not reuse it for this purpose? Is it not

Re: [FFmpeg-devel] [PATCH] frame: Simplify the video allocation

2018-09-03 Thread James Darnley
On 2018-09-03 15:29, James Almer wrote: > pass 32 - 1 to both av_image_fill_pointers() calls directly? Please do not add a magic number where nobody will find it. Use one of the 3 already existing methods for knowing the alignment necessary for assembly. If this is unrelated, my apologies.

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread James Darnley
On 2018-07-27 15:05, Henrik Gramner wrote: > On Fri, Jul 27, 2018 at 1:47 PM, James Darnley wrote: >> On 2018-07-26 17:29, Rostislav Pehlivanov wrote: >>>> +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w, >>>> x, b2 >>>> +

Re: [FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-27 Thread James Darnley
On 2018-07-26 17:29, Rostislav Pehlivanov wrote: > On 26 July 2018 at 12:28, James Darnley wrote: > +cglobal vertical_compose_haar_10bit, 3, 6, 4, b0, b1, w >> +DECLARE_REG_TMP 4,5 >> + >> +mova m2, [pd_1] >> +mov r3d, wd >> +and wd,

[FFmpeg-devel] [PATCH 1/3] diracdec: add 10-bit Haar SIMD functions

2018-07-26 Thread James Darnley
wavelet trasnform +;* Copyright (c) 2018 James Darnley +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.

[FFmpeg-devel] [PATCH 0/3 v2] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-26 Thread James Darnley
I will ask the same question as last time. Is the AVX worth it in Haar? Also I am surprised that the AVX2 doesn't have a bigger difference on some of the vertical transforms. James Darnley (3): diracdec: add 10-bit Haar SIMD functions diracdec: add 10-bit Legall 5,3 (5_3) SIMD func

[FFmpeg-devel] [PATCH 3/3] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-26 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 84fps SSE2: 111fps AVX2: 115fps dd97 vertical hi sse2: 2.77x faster (31773 vs. 11457 decicycles) compared with C avx2: 3.83x faster (31773 vs. 8297 decicycles) compared with C --- libavcodec/x

[FFmpeg-devel] [PATCH 2/3] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions

2018-07-26 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps legall vertical hi sse2: 3.86x faster (20201 vs. 5231 decicycles) compared with C avx2: 6.70x faster (20201 vs. 3014 decicycles) compared with C legall vertical l

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-25 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? Ask and ye shall receive. > haar horizontal compose > sse2: 3.67x faster (45248±108.1 vs. 12328±21.1 decicycles) compared with

Re: [FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-19 Thread James Darnley
On 2018-07-19 17:26, Rostislav Pehlivanov wrote: > On 19 July 2018 at 15:52, James Darnley wrote: > >> int32_t *b1, int32_t *b2, int >> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]); >> } >> >> +static void dd97_vertical_hi_sse2(i

Re: [FFmpeg-devel] [PATCH 0/6] x86 SIMD for dirac 10-bit wavelet transforms

2018-07-19 Thread James Darnley
On 2018-07-19 17:23, Rostislav Pehlivanov wrote: > > Could you provide standard overall transform results using START/STOP_TIMER > rather than overall decoding speed? > Coefficients sizes and therefore golomb unpacking speed changes with > respect to the transform so potentially there could be som

[FFmpeg-devel] [PATCH 5/6] diracdec: avx2 dd97

2018-07-19 Thread James Darnley
--- libavcodec/x86/dirac_dwt_10bit.asm| 3 ++- libavcodec/x86/dirac_dwt_init_10bit.c | 13 + 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index ae110d2945..2e039e11ea 100644 --- a/libavcodec

[FFmpeg-devel] [PATCH 4/6] diracdec: avx2 legall

2018-07-19 Thread James Darnley
--- libavcodec/x86/dirac_dwt_10bit.asm| 4 +++- libavcodec/x86/dirac_dwt_init_10bit.c | 22 ++ 2 files changed, 25 insertions(+), 1 deletion(-) diff --git a/libavcodec/x86/dirac_dwt_10bit.asm b/libavcodec/x86/dirac_dwt_10bit.asm index 681de5e1df..ae110d2945 100644 --- a/

[FFmpeg-devel] [PATCH 3/6] diracdec: add 10-bit Deslauriers-Dubuc 9, 7 (9_7) vertical high-pass function

2018-07-19 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 84fps SSE2: 111fps AVX2: 115fps --- libavcodec/x86/dirac_dwt_10bit.asm| 38 +++ libavcodec/x86/dirac_dwt_init_10bit.c | 16 +++ 2 files changed, 54 insertions(+) dif

[FFmpeg-devel] [PATCH 1/6] diracdec: add 10-bit Haar SIMD functions

2018-07-19 Thread James Darnley
@@ -0,0 +1,113 @@ +;** +;* x86 optimized discrete 10-bit wavelet trasnform +;* Copyright (c) 2018 James Darnley +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify

[FFmpeg-devel] [PATCH 2/6] diracdec: add 10-bit Legall 5, 3 (5_3) SIMD functions

2018-07-19 Thread James Darnley
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the relevant transform. C: 94fps SSE2: 118fps AVX2: 121fps --- libavcodec/x86/dirac_dwt_10bit.asm| 55 +++ libavcodec/x86/dirac_dwt_init_10bit.c | 23 +++ 2 files changed, 78 insertions(+) dif

  1   2   3   4   5   6   >