Re: [FFmpeg-devel] [PATCH V2] avutil/tx: add check against (*ctx)

2019-05-16 Thread John Cox
t you appear to want is if (!*ctx) which protects against multi-free and is useful in that it can be called unconditionally in cleanup code (assuming initial null assignments) and crashes in what you describe as the "stupid" case. >> return; >> >> a

[FFmpeg-devel] HEVC decoder for Raspberry Pi

2018-11-13 Thread John Cox
rdware frames but they aren't really. There must be a better way of auto-selecting the hevc_rpi decoder over the normal s/w hevc decoder, but I became confused by the existing h/w acceleration framework and what I wanted to do didn't seem to fit in neatly. Display should be a proper devic

Re: [FFmpeg-devel] HEVC decoder for Raspberry Pi

2018-11-14 Thread John Cox
Hi >Hi > >On Tue, Nov 13, 2018 at 03:52:18PM +0000, John Cox wrote: >> Hi >> >> I have been developing a hevc decoder for Raspberry Pi for some time >> now. As active development has now pretty much ceased and the code is >> believed stable it seems a good

Re: [FFmpeg-devel] HEVC decoder for Raspberry Pi

2018-11-15 Thread John Cox
Hi >On Wed, Nov 14, 2018 at 11:35:50AM +0000, John Cox wrote: >> Hi >> >> >Hi >> > >> >On Tue, Nov 13, 2018 at 03:52:18PM +, John Cox wrote: >> >> Hi >> >> >> >> I have been developing a hevc decoder for Raspberr

[FFmpeg-devel] How to do (HEVC) decoder fallback?

2017-11-13 Thread John Cox
esim/rpi-ffmpeg.git on branch test/wpp_1 - I do have a separated decoder version but I'd like to find out how I should integrate it before I commit it. Many thanks John Cox ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [RFC] swscale RGB24->YUV420P

2023-08-16 Thread John Cox
ere? I've tested by hand with libswscale/test/swscale but fate integration would be obviously better - I'm currently a bit lost in fate, where/how should I do this? Many thanks John Cox ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

Re: [FFmpeg-devel] [RFC] swscale RGB24->YUV420P

2023-08-17 Thread John Cox
On Wed, 16 Aug 2023 19:37:02 +0200, you wrote: >On Wed, Aug 16, 2023 at 05:15:23PM +0100, John Cox wrote: >> Hi >> >> The Pi has a use for a fast RGB24->YUV420P path for encoding camera >> video. There is an existing BGR24 converter but if I build a RGB24 >

[FFmpeg-devel] [PATCH v1 0/6] swscale: Add dedicated RGB->YUV unscaled functions & aarch64 asm

2023-08-20 Thread John Cox
with improved rounding or the previous template (I'm not quite sure what it does but it produces a different score out of tests/swscale to either method) so a simple results match isn't going to work. Regards John Cox John Cox (6): fate-filter-fps: Set swscale bitexact for tests that do

[FFmpeg-devel] [PATCH v1 1/6] fate-filter-fps: Set swscale bitexact for tests that do conversions

2023-08-20 Thread John Cox
-bitexact as a general flag doesn't affect swscale so add swscale option too to get correct CRCs in all circumstances. Signed-off-by: John Cox --- tests/fate/filter-video.mak | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/fate/filter-video.mak b/tests/fate/f

[FFmpeg-devel] [PATCH v1 2/6] swscale: Rename BGR24->YUV conversion functions as bgr...

2023-08-20 Thread John Cox
Rename swscale conversion functions for converting BGR24 frames to YUV as bgr24toyuv12 rather than rgb24toyuv12 as that is just confusing and would be even more confusing with the addition of RGB24 converters. Signed-off-by: John Cox --- libswscale/bayer_template.c | 2 +- libswscale

[FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion

2023-08-20 Thread John Cox
Add a rgb24->yuv420p conversion. Uses the same code as the existing bgr24->yuv converter but permutes the conversion array to swap R & B coefficients. Signed-off-by: John Cox --- libswscale/rgb2rgb.c | 5 + libswscale/rgb2rgb.h | 7 +++ libswscale/rgb2rgb_

[FFmpeg-devel] [PATCH v1 4/6] swscale: RGB24->YUV allow odd widths & improve C rounding

2023-08-20 Thread John Cox
dence isn't an issue there. Signed-off-by: John Cox --- libswscale/rgb2rgb_template.c | 42 ++- libswscale/swscale_unscaled.c | 5 ++-- libswscale/x86/rgb2rgb_template.c | 5 3 files changed, 32 insertions(+), 20 deletions(-) diff --git a/

[FFmpeg-devel] [PATCH v1 5/6] swscale: Add unscaled XRGB->YUV420P functions

2023-08-20 Thread John Cox
Add simple C functions for converting XRGB to YUV420P. Same logic as the RGB24 functions but dropping the A channel. Signed-off-by: John Cox --- libswscale/rgb2rgb.c | 20 +++ libswscale/rgb2rgb.h | 16 + libswscale/rgb2rgb_template.c | 106

[FFmpeg-devel] [PATCH v1 6/6] swscale: Add aarch64 functions for RGB24->YUV420P

2023-08-20 Thread John Cox
Neon RGB24->YUV420P and BGR24->YUV420P functions. Works on 16 pixel blocks and can do any width or height, though for widths less than 32 or so the C is likely faster. Signed-off-by: John Cox --- libswscale/aarch64/rgb2rgb.c | 8 + libswscale/aarch64/rgb2rgb_neon.S

Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion

2023-08-20 Thread John Cox
On Sun, 20 Aug 2023 19:16:14 +0200, you wrote: >On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote: >> Add a rgb24->yuv420p conversion. Uses the same code as the existing >> bgr24->yuv converter but permutes the conversion array to swap R & B >> coefficients.

Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion

2023-08-20 Thread John Cox
On Sun, 20 Aug 2023 19:45:11 +0200, you wrote: >On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote: >> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote: >> > Add a rgb24->yuv420p conversion. Uses the same code as the existing >> > bgr24-&

Re: [FFmpeg-devel] [PATCH v1 3/6] swscale: Add explicit rgb24->yv12 conversion

2023-08-22 Thread John Cox
On Mon, 21 Aug 2023 21:15:37 +0200, you wrote: >On Sun, Aug 20, 2023 at 07:28:40PM +0100, John Cox wrote: >> On Sun, 20 Aug 2023 19:45:11 +0200, you wrote: >> >> >On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote: >> >> On Sun, Aug 20, 2023

[FFmpeg-devel] Does rtspenc actually support AVFMT_GLOBALHEADER?

2024-08-19 Thread John Cox
LOBALHEADER from the flags in rtspenc.c fixes my problem and I'll very happily submit a patch to that effect, but first I'd like to know if that is in fact the root of my problem - my understanding of the RTSP code is very limited and I'd appreciate advice from someone who knows somethi

Re: [FFmpeg-devel] Does rtspenc actually support AVFMT_GLOBALHEADER?

2024-08-20 Thread John Cox
On Mon, 19 Aug 2024 at 19:32, Martin Storsjö wrote: > > On Mon, 19 Aug 2024, John Cox wrote: > > > Does rtspenc actually support AVFMT_GLOBALHEADER? It is specified in the > > FFOutputFormat flags but I can't see anywhere in the code where > > extradata is refer

Re: [FFmpeg-devel] [PATCH] libavdevice: Add KMS/DRM output device

2021-01-19 Thread John Cox
devices on the output of ffmpeg for testing purposes. Though I guess that if I want that then the device should be bundled with the application rather than in a library. John Cox ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/ma

[FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
ous union to avoid changing other cabac code - I could believe this was a no-no and I'll have to change that. 3) Uses clz which doesn't seem to exist in the ffmpeg int libs (though ctz does) I'll happily accept suggestions as to what is considered better practice for thes

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
Hi >On Tue, Jan 19, 2016 at 7:46 AM, John Cox wrote: > >> Hi >> >> I've just done a fair bit of work on hevc_cabac decode for the Rasberry >> Pi2 and I think that the patch is generally applicable. Patch is >> attached but you may prefer to take

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
On Tue, 19 Jan 2016 15:59:39 + (UTC), you wrote: >John Cox kynesim.co.uk> writes: > >> >> +#define UNCHECKED_BITSTREAM_READER 1 >> > >> >I don't think that's right, and is a security issue. >> >> I added that line as (nearly) eve

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
>John Cox kynesim.co.uk> writes: > >> On Tue, 19 Jan 2016 15:59:39 + (UTC), you wrote: >> >> >John Cox kynesim.co.uk> writes: >> > >> >> >> +#define UNCHECKED_BITSTREAM_READER 1 >> >> > >> >> >I do

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
>On 1/19/2016 9:46 AM, John Cox wrote: >> +// Helper fns >> +#ifndef hevc_mem_bits32 >> +static av_always_inline uint32_t hevc_mem_bits32(const void * buf, const >> unsigned int offset) >> +{ >> +return AV_RB32((const uint8_t *)buf + (offset &

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
On Tue, 19 Jan 2016 14:09:22 -0300, you wrote: >On 1/19/2016 2:05 PM, John Cox wrote: >>> On 1/19/2016 9:46 AM, John Cox wrote: >>>> +// Helper fns >>>> +#ifndef hevc_mem_bits32 >>>> +static av_always_inline uint32_t hevc_mem_bits32(c

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-19 Thread John Cox
>On 1/19/2016 2:24 PM, John Cox wrote: >> On Tue, 19 Jan 2016 14:09:22 -0300, you wrote: >> >>> On 1/19/2016 2:05 PM, John Cox wrote: >>>>> On 1/19/2016 9:46 AM, John Cox wrote: >>>>>> +// Helper fns >>>>>> +#ifndef hev

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-20 Thread John Cox
On Wed, 20 Jan 2016 13:26:05 +0100, you wrote: >Hi, > >2016-01-19 13:46 GMT+01:00 John Cox : >> I've just done a fair bit of work on hevc_cabac decode for the Rasberry >> Pi2 and I think that the patch is generally applicable. Patch is >> attached but you may pre

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-20 Thread John Cox
On Wed, 20 Jan 2016 13:26:05 +0100, you wrote: >Hi, > >2016-01-19 13:46 GMT+01:00 John Cox : >> I've just done a fair bit of work on hevc_cabac decode for the Rasberry >> Pi2 and I think that the patch is generally applicable. Patch is >> attached but you may pre

[FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-01-21 Thread John Cox
Hi v2 of my hevc residual patch I've fixed the fate regression I've split it into more pieces Now uses ff_clz Some reformating of function headers The patches can also be found on https://github.com/jc-kynesim/rpi-ffmpeg.git on branch test/ff_hevc_cabac_4 from tag ff_hevc_cabac_4_base Note that

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-01-22 Thread John Cox
>On Fri, Jan 22, 2016 at 01:41:11AM +0100, Michael Niedermayer wrote: >> On Thu, Jan 21, 2016 at 10:45:55AM +0000, John Cox wrote: >> > Hi >> > >> > v2 of my hevc residual patch >> > >> > I've fixed the fate regression >>

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-01-22 Thread John Cox
On Fri, 22 Jan 2016 01:57:58 +0100, you wrote: >On Fri, Jan 22, 2016 at 01:41:11AM +0100, Michael Niedermayer wrote: >> On Thu, Jan 21, 2016 at 10:45:55AM +0000, John Cox wrote: >> > Hi >> > >> > v2 of my hevc residual patch >> > >> > I

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-22 Thread John Cox
On Fri, 22 Jan 2016 12:18:29 +0100, you wrote: >Hi, > >2016-01-20 15:27 GMT+01:00 John Cox : >> The by22 code gained me an overall factor of two in the abs level decode >> - the gains do depend a lot on the quantity of residual - you gain a lot >> more on I-frames th

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-01-22 Thread John Cox
On Fri, 22 Jan 2016 14:42:27 +0100, you wrote: > [snip] >> >fate-hevc passes with patch 1-5, so the issue is likely in the last >> > >> >[...] >> >> Yup - bug in the arm update_rice (again - sorry). Now passes fate on >> ARM too (now I've learnt how to run fate on my Pi in a finite time). >> >>

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (especially ARM)

2016-01-22 Thread John Cox
Hi >Hi, > >2016-01-22 14:29 GMT+01:00 John Cox : >>>This is a big slowdown on Win64 and UHD-bluray like sequences, but >>>that can be switched off in that case. >> >> I'm a bit surprised that it generated a big slowdown - some cache must >> be

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-01-22 Thread John Cox
On Fri, 22 Jan 2016 18:52:23 +0100, you wrote: >Hi, > >2016-01-21 11:45 GMT+01:00 John Cox : >> Hi >> >> v2 of my hevc residual patch > >I'll review the bit not related to significant coeffs first, because I >think it is the most performance-sensitive. Al

[FFmpeg-devel] Allocating a single YUV buffer rather than 3?

2016-02-01 Thread John Cox
Hi In order to get a copy-free display on my target h/w I need to have my decode output YUV planes contiguous. The default allocater gets each plane separately (so they aren't or at least aren't always). Is there a simple preferred way of getting this to work? I've got slightly lost in the maze

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-02-02 Thread John Cox
the review/validation/commit. Thanks >2016-01-22 19:33 GMT+01:00 John Cox : >> Fair enough - though given that your slowdowns are almost certainly >> cache-related the whole may be quite different from the sum of the >> parts. > >True, they don't always translate

Re: [FFmpeg-devel] [PATCH]levc/hevc_cabac Optimise ff_hevc_hls_residual_coding (v2)

2016-02-03 Thread John Cox
On Tue, 2 Feb 2016 12:52:15 +0100, you wrote: >Hi, > >as a motus operandi for this review, I have no time for a proper one, >or at least not fitting with John's timeframe. I'll try to close as >many pending discussions, and would prefer if someone else completed >the review/validation/commit. Do

[FFmpeg-devel] [PATCH] configure fix arm inline defines

2018-05-30 Thread John Cox
adds quotes around the asm that is in the __asm__ statement Regards John Cox diff --git a/configure b/configure index 22eeca22a5..4dbee8d349 100755 --- a/configure +++ b/configure @@ -1040,7 +1040,7 @@ EOF check_insn(){ log check_insn "$@" -check_inline_asm

[FFmpeg-devel] [PATCH] use av_clip_uintp2_c where clip is variable

2018-05-31 Thread John Cox
Hi I enclose a patch that changes av_clip_uintp2 to av_clip_uintp2_c where the bit depth is variable. This fixes compilation issues if HAVE_ARMV6_INLINE is 1 and therefore allows arm inline detection to be fixed too. Regards John Cox variable_clip.patch Description: Binary data

Re: [FFmpeg-devel] Patch: Replace quotes for inline asm detection.

2018-05-31 Thread John Cox
529:9: warning: ‘avcodec_decode_video2’ is >> deprecated (declared at src/libavcodec/avcodec.h:4756) >> [-Wdeprecated-declarations] >> src/libavfilter/src_movie.c:532:9: warning: ‘avcodec_decode_audio4’ is >> deprecated (declared at src/libavcodec/avcodec.h:4707) >> [-Wdepr

[FFmpeg-devel] [PATCH v2] configure fix arm inline defines

2018-06-04 Thread John Cox
--mfpu=neon on the command line too. I'm not sure how to get it there unless I pass it as extra flags. This patch adds quotes around the asm that is in the __asm__ statement Regards John Cox diff --git a/configure b/configure index 22eeca22a5..4dbee8d349 100755 --- a/configure +++ b/conf

Re: [FFmpeg-devel] [PATCH v2] configure fix arm inline defines

2018-06-06 Thread John Cox
E_INLINE 1 >#define HAVE_ARMV6_INLINE 1 >#define HAVE_ARMV6T2_INLINE 1 >#define HAVE_ARMV8_INLINE 0 >#define HAVE_NEON_INLINE 0 >#define HAVE_VFP_INLINE 1 >#define HAVE_VFPV3_INLINE 1 >#define HAVE_SETEND_INLINE 1 > >If I want to get Neon enabled as well then I need to have a --mfpu=ne

[FFmpeg-devel] [PATCH] configure: fix inline neon regression

2018-06-07 Thread John Cox
that probe_arm_arch ends up setting subarch to armv7-a when the other bits of the script expect armv7a (although gcc wants armv7-a in -march). Again I am confused by this but I'm not sure what the right answer is let alone the correct fix. Maybe whoever wrote this bit of configure could revis

[FFmpeg-devel] [PATCH] avfilter/vf_bwdif: Add capability to deinterlace NV12

2024-01-12 Thread John Cox
As bwdif takes no account of horizontally adjacent pixels the same code can be used on planes that have multiple components as is used on single component planes. Update the filtering code to cope with multi-component planes and add NV12 to the list of supported formats. Signed-off-by: John Cox

[FFmpeg-devel] [PATCH 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-06-29 Thread John Cox
Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy John Cox (15): avfilter/vf_bwdif: Add outline for aarch neon functions avfilter/vf_bwdif: Add common macros and consts for aarch64 neon avfilter/vf_bwdif: Export C filter_intra

[FFmpeg-devel] [PATCH 01/15] avfilter/vf_bwdif: Add outline for aarch neon functions

2023-06-29 Thread John Cox
Outline but no actual functions. Signed-off-by: John Cox --- libavfilter/aarch64/Makefile| 2 ++ libavfilter/aarch64/vf_bwdif_init_aarch64.c | 39 + libavfilter/aarch64/vf_bwdif_neon.S | 25 + libavfilter/bwdif.h

[FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-06-29 Thread John Cox
Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to byte Add filter constants Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 46 + 1 file changed, 46 insertions(+) diff --gi

[FFmpeg-devel] [PATCH 03/15] avfilter/vf_bwdif: Export C filter_intra

2023-06-29 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 3 +++ libavfilter/vf_bwdif.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index 6a0f70487a..ae6f6ce223 100644 --- a/libavfilter

[FFmpeg-devel] [PATCH 04/15] avfilter/vf_bwdif: Add neon for filter_intra

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++ libavfilter/aarch64/vf_bwdif_neon.S | 53 + 2 files changed, 70 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH 05/15] tests/checkasm: Add test for vf_bwdif filter_intra

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 + 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH 06/15] avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 59 + 1 file changed, 59 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index b863b3447d..6c5d1598f4 100644 --- a/libavfilter/aarch64

[FFmpeg-devel] [PATCH 13/15] avfilter/vf_bwdif: Add neon for filter_line3

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++ libavfilter/aarch64/vf_bwdif_neon.S | 278 2 files changed, 306 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH 07/15] avfilter/vf_bwdif: Export C filter_edge

2023-06-29 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 4 libavfilter/vf_bwdif.c | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae6f6ce223..ae1616d366 100644 --- a/libavfilter

[FFmpeg-devel] [PATCH 14/15] tests/checkasm: Add test for vf_bwdif filter_line3

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 81 +++ 1 file changed, 81 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 5fdba09fdc..3399cacdf7 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 libavfilter/aarch64/vf_bwdif_neon.S | 104 2 files changed, 124 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH 15/15] avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

2023-06-29 Thread John Cox
create any noticable thread load variation. Signed-off-by: John Cox --- libavfilter/vf_bwdif.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index 52bc676cf8..6701208efe 100644 --- a/libavfilter/vf_bwdif.c +++ b

[FFmpeg-devel] [PATCH 09/15] tests/checkasm: Add test for vf_bwdif filter_edge

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 54 +++ 1 file changed, 54 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 034bbabb4c..5fdba09fdc 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH 10/15] avfilter/vf_bwdif: Export C filter_line

2023-06-29 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 5 + libavfilter/vf_bwdif.c | 10 +- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae1616d366..cce99953f3 100644 --- a

[FFmpeg-devel] [PATCH 11/15] avfilter/vf_bwdif: Add neon for filter_line

2023-06-29 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 215 2 files changed, 236 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH 12/15] avfilter/vf_bwdif: Add a filter_line3 method for optimisation

2023-06-29 Thread John Cox
% better than two filter_lines and a memcpy. Signed-off-by: John Cox --- libavfilter/bwdif.h| 7 +++ libavfilter/vf_bwdif.c | 31 +++ 2 files changed, 38 insertions(+) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index cce99953f3..496cec72ef 100644

Re: [FFmpeg-devel] [PATCH 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-02 Thread John Cox
Hi >On Thu, 29 Jun 2023, John Cox wrote: > >> Also adds a filter_line3 method which on aarch64 neon yields approx 30% >> speedup over 2xfilter_line and a memcpy >> >> John Cox (15): >> avfilter/vf_bwdif: Add outline for aarch neon functions >> avfilter/

Re: [FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-02 Thread John Cox
On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote: >On Thu, 29 Jun 2023, John Cox wrote: > >> Add macros for dual scalar half->single multiply and accumulate >> Add macro for shift, saturate and shorten single to byte >> Add filter constants >> >> Signed-

Re: [FFmpeg-devel] [PATCH 04/15] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-02 Thread John Cox
On Sun, 2 Jul 2023 00:37:35 +0300 (EEST), you wrote: >On Thu, 29 Jun 2023, John Cox wrote: > >> Signed-off-by: John Cox >> --- >> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++ >> libavfilter/aarch64/vf_bwdif_neon.S | 53 + >

Re: [FFmpeg-devel] [PATCH 08/15] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-02 Thread John Cox
On Sun, 2 Jul 2023 00:40:09 +0300 (EEST), you wrote: >On Thu, 29 Jun 2023, John Cox wrote: > >> Signed-off-by: John Cox >> --- >> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 >> libavfilter/aarch64/vf_bwdif_neon.S | 104 >

Re: [FFmpeg-devel] [PATCH 11/15] avfilter/vf_bwdif: Add neon for filter_line

2023-07-02 Thread John Cox
On Sun, 2 Jul 2023 00:44:10 +0300 (EEST), you wrote: >On Thu, 29 Jun 2023, John Cox wrote: > >> Signed-off-by: John Cox >> --- >> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ >> libavfilter/aarch64/vf_bwdif_neon.S | 215 >

[FFmpeg-devel] [PATCH v2 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-02 Thread John Cox
Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v1: .align 16 corrected to .balign 16 SXTW tolower Mac ABI (hopefully) fixed V register pop/push macroed & prettified John Cox (15): avfilter/vf_bwdif: Add out

[FFmpeg-devel] [PATCH v2 01/15] avfilter/vf_bwdif: Add outline for aarch neon functions

2023-07-02 Thread John Cox
Outline but no actual functions. Signed-off-by: John Cox --- libavfilter/aarch64/Makefile| 2 ++ libavfilter/aarch64/vf_bwdif_init_aarch64.c | 39 + libavfilter/aarch64/vf_bwdif_neon.S | 25 + libavfilter/bwdif.h

[FFmpeg-devel] [PATCH v2 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-02 Thread John Cox
Add macros for dual scalar half->single multiply and accumulate Add macro for shift, saturate and shorten single to byte Add filter constants Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 53 + 1 file changed, 53 insertions(+) diff --gi

[FFmpeg-devel] [PATCH v2 03/15] avfilter/vf_bwdif: Export C filter_intra

2023-07-02 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 3 +++ libavfilter/vf_bwdif.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index 6a0f70487a..ae6f6ce223 100644 --- a/libavfilter

[FFmpeg-devel] [PATCH v2 04/15] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++ libavfilter/aarch64/vf_bwdif_neon.S | 53 + 2 files changed, 70 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH v2 08/15] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 libavfilter/aarch64/vf_bwdif_neon.S | 104 2 files changed, 124 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH v2 09/15] tests/checkasm: Add test for vf_bwdif filter_edge

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 54 +++ 1 file changed, 54 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 034bbabb4c..5fdba09fdc 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v2 05/15] tests/checkasm: Add test for vf_bwdif filter_intra

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 + 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v2 06/15] avfilter/vf_bwdif: Add clip and spatial macros for aarch64 neon

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_neon.S | 73 + 1 file changed, 73 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_neon.S b/libavfilter/aarch64/vf_bwdif_neon.S index 6a614f8d6e..48dc7bcd9d 100644 --- a/libavfilter/aarch64

[FFmpeg-devel] [PATCH v2 10/15] avfilter/vf_bwdif: Export C filter_line

2023-07-02 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 5 + libavfilter/vf_bwdif.c | 10 +- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae1616d366..cce99953f3 100644 --- a

[FFmpeg-devel] [PATCH v2 07/15] avfilter/vf_bwdif: Export C filter_edge

2023-07-02 Thread John Cox
Needed for tail fixup of neon code Signed-off-by: John Cox --- libavfilter/bwdif.h| 4 libavfilter/vf_bwdif.c | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index ae6f6ce223..ae1616d366 100644 --- a/libavfilter

[FFmpeg-devel] [PATCH v2 11/15] avfilter/vf_bwdif: Add neon for filter_line

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 208 2 files changed, 229 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH v2 12/15] avfilter/vf_bwdif: Add a filter_line3 method for optimisation

2023-07-02 Thread John Cox
% better than two filter_lines and a memcpy. Signed-off-by: John Cox --- libavfilter/bwdif.h| 7 +++ libavfilter/vf_bwdif.c | 31 +++ 2 files changed, 38 insertions(+) diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h index cce99953f3..496cec72ef 100644

[FFmpeg-devel] [PATCH v2 13/15] avfilter/vf_bwdif: Add neon for filter_line3

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++ libavfilter/aarch64/vf_bwdif_neon.S | 272 2 files changed, 300 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

[FFmpeg-devel] [PATCH v2 14/15] tests/checkasm: Add test for vf_bwdif filter_line3

2023-07-02 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 81 +++ 1 file changed, 81 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 5fdba09fdc..3399cacdf7 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v2 15/15] avfilter/vf_bwdif: Block filter slices into a multiple of 4 lines

2023-07-02 Thread John Cox
create any noticable thread load variation. Signed-off-by: John Cox --- libavfilter/vf_bwdif.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c index 52bc676cf8..6701208efe 100644 --- a/libavfilter/vf_bwdif.c +++ b

Re: [FFmpeg-devel] [PATCH v2 12/15] avfilter/vf_bwdif: Add a filter_line3 method for optimisation

2023-07-03 Thread John Cox
On Mon, 3 Jul 2023 00:12:46 +0300 (EEST), you wrote: >On Sun, 2 Jul 2023, Thomas Mundt wrote: > >> Am So., 2. Juli 2023 um 14:34 Uhr schrieb John Cox : >> Add an optional filter_line3 to the available optimisations. >> >> filter_line3 is equivalent to fi

Re: [FFmpeg-devel] [PATCH 02/15] avfilter/vf_bwdif: Add common macros and consts for aarch64 neon

2023-07-03 Thread John Cox
On Mon, 3 Jul 2023 00:02:27 +0300 (EEST), you wrote: >On Sun, 2 Jul 2023, Martin Storsjö wrote: > >> On Sun, 2 Jul 2023, John Cox wrote: >> >>> On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote: >>> >>>> On Thu, 29 Jun 2023, John Cox wrote: >

Re: [FFmpeg-devel] [PATCH v2 00/15] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-03 Thread John Cox
On Mon, 3 Jul 2023 00:09:52 +0300 (EEST), you wrote: >On Sun, 2 Jul 2023, John Cox wrote: > >> Also adds a filter_line3 method which on aarch64 neon yields approx 30% >> speedup over 2xfilter_line and a memcpy >> >> Differences from v1: >> .align 16 corrected

[FFmpeg-devel] [PATCH v3 0/7] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-03 Thread John Cox
Also adds a filter_line3 method which on aarch64 neon yields approx 30% speedup over 2xfilter_line and a memcpy Differences from v2: coeffs moved into const segment number of patches reduced John Cox (7): tests/checkasm: Add test for vf_bwdif filter_intra avfilter/vf_bwdif: Add neon for

[FFmpeg-devel] [PATCH v3 1/7] tests/checkasm: Add test for vf_bwdif filter_intra

2023-07-03 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 + 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v3 2/7] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-03 Thread John Cox
Adds an outline for aarch neon functions Adds common macros and consts for aarch64 neon Exports C filter_intra needed for tail fixup of neon code Adds neon for filter_intra Signed-off-by: John Cox --- libavfilter/aarch64/Makefile| 2 + libavfilter/aarch64/vf_bwdif_init_aarch64

[FFmpeg-devel] [PATCH v3 3/7] tests/checkasm: Add test for vf_bwdif filter_edge

2023-07-03 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 54 +++ 1 file changed, 54 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 034bbabb4c..5fdba09fdc 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v3 4/7] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-03 Thread John Cox
Adds clip and spatial macros for aarch64 neon Exports C filter_edge needed for tail fixup of neon code Adds neon for filter_edge Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++ libavfilter/aarch64/vf_bwdif_neon.S | 177 libavfilter

[FFmpeg-devel] [PATCH v3 6/7] avfilter/vf_bwdif: Add a filter_line3 method for optimisation

2023-07-03 Thread John Cox
may do up to 3 extra lines but filter_edge is faster than filter_line so it is unlikely to create any noticable thread load variation. Signed-off-by: John Cox --- libavfilter/bwdif.h | 7 libavfilter/vf_bwdif.c| 44 +++-- tests/checkasm/vf_bwdif.c | 81

[FFmpeg-devel] [PATCH v3 5/7] avfilter/vf_bwdif: Add neon for filter_line Exports C filter_line needed for tail fixup of neon code

2023-07-03 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++ libavfilter/aarch64/vf_bwdif_neon.S | 208 libavfilter/bwdif.h | 5 + libavfilter/vf_bwdif.c | 10 +- 4 files changed, 239 insertions

[FFmpeg-devel] [PATCH v3 7/7] avfilter/vf_bwdif: Add neon for filter_line3

2023-07-03 Thread John Cox
Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++ libavfilter/aarch64/vf_bwdif_neon.S | 272 2 files changed, 300 insertions(+) diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c b/libavfilter/aarch64/vf_bwdif_init_aarch64.c

Re: [FFmpeg-devel] [PATCH v2 05/15] tests/checkasm: Add test for vf_bwdif filter_intra

2023-07-04 Thread John Cox
On Mon, 3 Jul 2023 00:14:16 +0300 (EEST), you wrote: >[snip] >It's a bit of a shame that this only tests things for 8 bit, not 10, but I >guess that's better than nothing. The way the current code is set up to >template both variants of the tests isn't very neat either... Is there actually >8-b

[FFmpeg-devel] [PATCH v4 0/7] avfilter/vf_bwdif: Add aarch64 neon functions

2023-07-04 Thread John Cox
I've applied all the requested changes and I didn't want this mistake in the final patchset. (The mistake was benign - it just wasted a few cycles.) John Cox (7): tests/checkasm: Add test for vf_bwdif filter_intra avfilter/vf_bwdif: Add neon for filter_intra tests/checkasm: Add test fo

[FFmpeg-devel] [PATCH v4 1/7] tests/checkasm: Add test for vf_bwdif filter_intra

2023-07-04 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 37 + 1 file changed, 37 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 46224bb575..034bbabb4c 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v4 2/7] avfilter/vf_bwdif: Add neon for filter_intra

2023-07-04 Thread John Cox
Adds an outline for aarch neon functions Adds common macros and consts for aarch64 neon Exports C filter_intra needed for tail fixup of neon code Adds neon for filter_intra Signed-off-by: John Cox --- libavfilter/aarch64/Makefile| 2 + libavfilter/aarch64/vf_bwdif_init_aarch64

[FFmpeg-devel] [PATCH v4 3/7] tests/checkasm: Add test for vf_bwdif filter_edge

2023-07-04 Thread John Cox
Signed-off-by: John Cox --- tests/checkasm/vf_bwdif.c | 54 +++ 1 file changed, 54 insertions(+) diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c index 034bbabb4c..5fdba09fdc 100644 --- a/tests/checkasm/vf_bwdif.c +++ b/tests/checkasm

[FFmpeg-devel] [PATCH v4 4/7] avfilter/vf_bwdif: Add neon for filter_edge

2023-07-04 Thread John Cox
Adds clip and spatial macros for aarch64 neon Exports C filter_edge needed for tail fixup of neon code Adds neon for filter_edge Signed-off-by: John Cox --- libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++ libavfilter/aarch64/vf_bwdif_neon.S | 177 libavfilter

  1   2   >