t you appear to want
is
if (!*ctx)
which protects against multi-free and is useful in that it can be called
unconditionally in cleanup code (assuming initial null assignments) and
crashes in what you describe as the "stupid" case.
>> return;
>>
>> a
rdware frames but they aren't really.
There must be a better way of auto-selecting the hevc_rpi decoder over
the normal s/w hevc decoder, but I became confused by the existing h/w
acceleration framework and what I wanted to do didn't seem to fit in
neatly.
Display should be a proper devic
Hi
>Hi
>
>On Tue, Nov 13, 2018 at 03:52:18PM +0000, John Cox wrote:
>> Hi
>>
>> I have been developing a hevc decoder for Raspberry Pi for some time
>> now. As active development has now pretty much ceased and the code is
>> believed stable it seems a good
Hi
>On Wed, Nov 14, 2018 at 11:35:50AM +0000, John Cox wrote:
>> Hi
>>
>> >Hi
>> >
>> >On Tue, Nov 13, 2018 at 03:52:18PM +, John Cox wrote:
>> >> Hi
>> >>
>> >> I have been developing a hevc decoder for Raspberr
esim/rpi-ffmpeg.git on
branch test/wpp_1 - I do have a separated decoder version but I'd like
to find out how I should integrate it before I commit it.
Many thanks
John Cox
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
ere?
I've tested by hand with libswscale/test/swscale but fate integration
would be obviously better - I'm currently a bit lost in fate, where/how
should I do this?
Many thanks
John Cox
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
On Wed, 16 Aug 2023 19:37:02 +0200, you wrote:
>On Wed, Aug 16, 2023 at 05:15:23PM +0100, John Cox wrote:
>> Hi
>>
>> The Pi has a use for a fast RGB24->YUV420P path for encoding camera
>> video. There is an existing BGR24 converter but if I build a RGB24
>
with improved rounding or the previous template (I'm not quite
sure what it does but it produces a different score out of tests/swscale
to either method) so a simple results match isn't going to work.
Regards
John Cox
John Cox (6):
fate-filter-fps: Set swscale bitexact for tests that do
-bitexact as a general flag doesn't affect swscale so add swscale option
too to get correct CRCs in all circumstances.
Signed-off-by: John Cox
---
tests/fate/filter-video.mak | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tests/fate/filter-video.mak b/tests/fate/f
Rename swscale conversion functions for converting BGR24 frames to YUV
as bgr24toyuv12 rather than rgb24toyuv12 as that is just confusing and
would be even more confusing with the addition of RGB24 converters.
Signed-off-by: John Cox
---
libswscale/bayer_template.c | 2 +-
libswscale
Add a rgb24->yuv420p conversion. Uses the same code as the existing
bgr24->yuv converter but permutes the conversion array to swap R & B
coefficients.
Signed-off-by: John Cox
---
libswscale/rgb2rgb.c | 5 +
libswscale/rgb2rgb.h | 7 +++
libswscale/rgb2rgb_
dence isn't an issue there.
Signed-off-by: John Cox
---
libswscale/rgb2rgb_template.c | 42 ++-
libswscale/swscale_unscaled.c | 5 ++--
libswscale/x86/rgb2rgb_template.c | 5
3 files changed, 32 insertions(+), 20 deletions(-)
diff --git a/
Add simple C functions for converting XRGB to YUV420P. Same logic as the
RGB24 functions but dropping the A channel.
Signed-off-by: John Cox
---
libswscale/rgb2rgb.c | 20 +++
libswscale/rgb2rgb.h | 16 +
libswscale/rgb2rgb_template.c | 106
Neon RGB24->YUV420P and BGR24->YUV420P functions. Works on 16 pixel
blocks and can do any width or height, though for widths less than 32 or
so the C is likely faster.
Signed-off-by: John Cox
---
libswscale/aarch64/rgb2rgb.c | 8 +
libswscale/aarch64/rgb2rgb_neon.S
On Sun, 20 Aug 2023 19:16:14 +0200, you wrote:
>On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
>> Add a rgb24->yuv420p conversion. Uses the same code as the existing
>> bgr24->yuv converter but permutes the conversion array to swap R & B
>> coefficients.
On Sun, 20 Aug 2023 19:45:11 +0200, you wrote:
>On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
>> On Sun, Aug 20, 2023 at 03:10:19PM +0000, John Cox wrote:
>> > Add a rgb24->yuv420p conversion. Uses the same code as the existing
>> > bgr24-&
On Mon, 21 Aug 2023 21:15:37 +0200, you wrote:
>On Sun, Aug 20, 2023 at 07:28:40PM +0100, John Cox wrote:
>> On Sun, 20 Aug 2023 19:45:11 +0200, you wrote:
>>
>> >On Sun, Aug 20, 2023 at 07:16:14PM +0200, Michael Niedermayer wrote:
>> >> On Sun, Aug 20, 2023
LOBALHEADER from the flags in rtspenc.c fixes my
problem and I'll very happily submit a patch to that effect, but first
I'd like to know if that is in fact the root of my problem - my
understanding of the RTSP code is very limited and I'd appreciate advice
from someone who knows somethi
On Mon, 19 Aug 2024 at 19:32, Martin Storsjö wrote:
>
> On Mon, 19 Aug 2024, John Cox wrote:
>
> > Does rtspenc actually support AVFMT_GLOBALHEADER? It is specified in the
> > FFOutputFormat flags but I can't see anywhere in the code where
> > extradata is refer
devices on the output of ffmpeg for testing purposes.
Though I guess that if I want that then the device should be bundled
with the application rather than in a library.
John Cox
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/ma
ous
union to avoid changing other cabac code - I could believe this was a
no-no and I'll have to change that.
3) Uses clz which doesn't seem to exist in the ffmpeg int libs (though
ctz does)
I'll happily accept suggestions as to what is considered better practice
for thes
Hi
>On Tue, Jan 19, 2016 at 7:46 AM, John Cox wrote:
>
>> Hi
>>
>> I've just done a fair bit of work on hevc_cabac decode for the Rasberry
>> Pi2 and I think that the patch is generally applicable. Patch is
>> attached but you may prefer to take
On Tue, 19 Jan 2016 15:59:39 + (UTC), you wrote:
>John Cox kynesim.co.uk> writes:
>
>> >> +#define UNCHECKED_BITSTREAM_READER 1
>> >
>> >I don't think that's right, and is a security issue.
>>
>> I added that line as (nearly) eve
>John Cox kynesim.co.uk> writes:
>
>> On Tue, 19 Jan 2016 15:59:39 + (UTC), you wrote:
>>
>> >John Cox kynesim.co.uk> writes:
>> >
>> >> >> +#define UNCHECKED_BITSTREAM_READER 1
>> >> >
>> >> >I do
>On 1/19/2016 9:46 AM, John Cox wrote:
>> +// Helper fns
>> +#ifndef hevc_mem_bits32
>> +static av_always_inline uint32_t hevc_mem_bits32(const void * buf, const
>> unsigned int offset)
>> +{
>> +return AV_RB32((const uint8_t *)buf + (offset &
On Tue, 19 Jan 2016 14:09:22 -0300, you wrote:
>On 1/19/2016 2:05 PM, John Cox wrote:
>>> On 1/19/2016 9:46 AM, John Cox wrote:
>>>> +// Helper fns
>>>> +#ifndef hevc_mem_bits32
>>>> +static av_always_inline uint32_t hevc_mem_bits32(c
>On 1/19/2016 2:24 PM, John Cox wrote:
>> On Tue, 19 Jan 2016 14:09:22 -0300, you wrote:
>>
>>> On 1/19/2016 2:05 PM, John Cox wrote:
>>>>> On 1/19/2016 9:46 AM, John Cox wrote:
>>>>>> +// Helper fns
>>>>>> +#ifndef hev
On Wed, 20 Jan 2016 13:26:05 +0100, you wrote:
>Hi,
>
>2016-01-19 13:46 GMT+01:00 John Cox :
>> I've just done a fair bit of work on hevc_cabac decode for the Rasberry
>> Pi2 and I think that the patch is generally applicable. Patch is
>> attached but you may pre
On Wed, 20 Jan 2016 13:26:05 +0100, you wrote:
>Hi,
>
>2016-01-19 13:46 GMT+01:00 John Cox :
>> I've just done a fair bit of work on hevc_cabac decode for the Rasberry
>> Pi2 and I think that the patch is generally applicable. Patch is
>> attached but you may pre
Hi
v2 of my hevc residual patch
I've fixed the fate regression
I've split it into more pieces
Now uses ff_clz
Some reformating of function headers
The patches can also be found on
https://github.com/jc-kynesim/rpi-ffmpeg.git on branch
test/ff_hevc_cabac_4 from tag ff_hevc_cabac_4_base
Note that
>On Fri, Jan 22, 2016 at 01:41:11AM +0100, Michael Niedermayer wrote:
>> On Thu, Jan 21, 2016 at 10:45:55AM +0000, John Cox wrote:
>> > Hi
>> >
>> > v2 of my hevc residual patch
>> >
>> > I've fixed the fate regression
>>
On Fri, 22 Jan 2016 01:57:58 +0100, you wrote:
>On Fri, Jan 22, 2016 at 01:41:11AM +0100, Michael Niedermayer wrote:
>> On Thu, Jan 21, 2016 at 10:45:55AM +0000, John Cox wrote:
>> > Hi
>> >
>> > v2 of my hevc residual patch
>> >
>> > I
On Fri, 22 Jan 2016 12:18:29 +0100, you wrote:
>Hi,
>
>2016-01-20 15:27 GMT+01:00 John Cox :
>> The by22 code gained me an overall factor of two in the abs level decode
>> - the gains do depend a lot on the quantity of residual - you gain a lot
>> more on I-frames th
On Fri, 22 Jan 2016 14:42:27 +0100, you wrote:
> [snip]
>> >fate-hevc passes with patch 1-5, so the issue is likely in the last
>> >
>> >[...]
>>
>> Yup - bug in the arm update_rice (again - sorry). Now passes fate on
>> ARM too (now I've learnt how to run fate on my Pi in a finite time).
>>
>>
Hi
>Hi,
>
>2016-01-22 14:29 GMT+01:00 John Cox :
>>>This is a big slowdown on Win64 and UHD-bluray like sequences, but
>>>that can be switched off in that case.
>>
>> I'm a bit surprised that it generated a big slowdown - some cache must
>> be
On Fri, 22 Jan 2016 18:52:23 +0100, you wrote:
>Hi,
>
>2016-01-21 11:45 GMT+01:00 John Cox :
>> Hi
>>
>> v2 of my hevc residual patch
>
>I'll review the bit not related to significant coeffs first, because I
>think it is the most performance-sensitive. Al
Hi
In order to get a copy-free display on my target h/w I need to have my
decode output YUV planes contiguous. The default allocater gets each
plane separately (so they aren't or at least aren't always). Is there a
simple preferred way of getting this to work? I've got slightly lost in
the maze
the review/validation/commit.
Thanks
>2016-01-22 19:33 GMT+01:00 John Cox :
>> Fair enough - though given that your slowdowns are almost certainly
>> cache-related the whole may be quite different from the sum of the
>> parts.
>
>True, they don't always translate
On Tue, 2 Feb 2016 12:52:15 +0100, you wrote:
>Hi,
>
>as a motus operandi for this review, I have no time for a proper one,
>or at least not fitting with John's timeframe. I'll try to close as
>many pending discussions, and would prefer if someone else completed
>the review/validation/commit.
Do
adds quotes around the asm that is in the __asm__ statement
Regards
John Cox
diff --git a/configure b/configure
index 22eeca22a5..4dbee8d349 100755
--- a/configure
+++ b/configure
@@ -1040,7 +1040,7 @@ EOF
check_insn(){
log check_insn "$@"
-check_inline_asm
Hi
I enclose a patch that changes av_clip_uintp2 to av_clip_uintp2_c where
the bit depth is variable. This fixes compilation issues if
HAVE_ARMV6_INLINE is 1 and therefore allows arm inline detection to be
fixed too.
Regards
John Cox
variable_clip.patch
Description: Binary data
529:9: warning: ‘avcodec_decode_video2’ is
>> deprecated (declared at src/libavcodec/avcodec.h:4756)
>> [-Wdeprecated-declarations]
>> src/libavfilter/src_movie.c:532:9: warning: ‘avcodec_decode_audio4’ is
>> deprecated (declared at src/libavcodec/avcodec.h:4707)
>> [-Wdepr
--mfpu=neon
on the command line too. I'm not sure how to get it there unless I pass
it as extra flags.
This patch adds quotes around the asm that is in the __asm__ statement
Regards
John Cox
diff --git a/configure b/configure
index 22eeca22a5..4dbee8d349 100755
--- a/configure
+++ b/conf
E_INLINE 1
>#define HAVE_ARMV6_INLINE 1
>#define HAVE_ARMV6T2_INLINE 1
>#define HAVE_ARMV8_INLINE 0
>#define HAVE_NEON_INLINE 0
>#define HAVE_VFP_INLINE 1
>#define HAVE_VFPV3_INLINE 1
>#define HAVE_SETEND_INLINE 1
>
>If I want to get Neon enabled as well then I need to have a --mfpu=ne
that probe_arm_arch ends up setting subarch to armv7-a
when the other bits of the script expect armv7a (although gcc wants
armv7-a in -march). Again I am confused by this but I'm not sure what
the right answer is let alone the correct fix. Maybe whoever wrote this
bit of configure could revis
As bwdif takes no account of horizontally adjacent pixels the same
code can be used on planes that have multiple components as is used
on single component planes. Update the filtering code to cope with
multi-component planes and add NV12 to the list of supported formats.
Signed-off-by: John Cox
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
John Cox (15):
avfilter/vf_bwdif: Add outline for aarch neon functions
avfilter/vf_bwdif: Add common macros and consts for aarch64 neon
avfilter/vf_bwdif: Export C filter_intra
Outline but no actual functions.
Signed-off-by: John Cox
---
libavfilter/aarch64/Makefile| 2 ++
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 39 +
libavfilter/aarch64/vf_bwdif_neon.S | 25 +
libavfilter/bwdif.h
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to byte
Add filter constants
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 46 +
1 file changed, 46 insertions(+)
diff --gi
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 3 +++
libavfilter/vf_bwdif.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index 6a0f70487a..ae6f6ce223 100644
--- a/libavfilter
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++
libavfilter/aarch64/vf_bwdif_neon.S | 53 +
2 files changed, 70 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 37 +
1 file changed, 37 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 46224bb575..034bbabb4c 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 59 +
1 file changed, 59 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_neon.S
b/libavfilter/aarch64/vf_bwdif_neon.S
index b863b3447d..6c5d1598f4 100644
--- a/libavfilter/aarch64
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++
libavfilter/aarch64/vf_bwdif_neon.S | 278
2 files changed, 306 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 4
libavfilter/vf_bwdif.c | 8
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index ae6f6ce223..ae1616d366 100644
--- a/libavfilter
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 81 +++
1 file changed, 81 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 5fdba09fdc..3399cacdf7 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20
libavfilter/aarch64/vf_bwdif_neon.S | 104
2 files changed, 124 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
create any noticable thread load variation.
Signed-off-by: John Cox
---
libavfilter/vf_bwdif.c | 13 ++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c
index 52bc676cf8..6701208efe 100644
--- a/libavfilter/vf_bwdif.c
+++ b
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 54 +++
1 file changed, 54 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 034bbabb4c..5fdba09fdc 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 5 +
libavfilter/vf_bwdif.c | 10 +-
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index ae1616d366..cce99953f3 100644
--- a
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
libavfilter/aarch64/vf_bwdif_neon.S | 215
2 files changed, 236 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
% better than two filter_lines
and a memcpy.
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 7 +++
libavfilter/vf_bwdif.c | 31 +++
2 files changed, 38 insertions(+)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index cce99953f3..496cec72ef 100644
Hi
>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
>> speedup over 2xfilter_line and a memcpy
>>
>> John Cox (15):
>> avfilter/vf_bwdif: Add outline for aarch neon functions
>> avfilter/
On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote:
>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Add macros for dual scalar half->single multiply and accumulate
>> Add macro for shift, saturate and shorten single to byte
>> Add filter constants
>>
>> Signed-
On Sun, 2 Jul 2023 00:37:35 +0300 (EEST), you wrote:
>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Signed-off-by: John Cox
>> ---
>> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++
>> libavfilter/aarch64/vf_bwdif_neon.S | 53 +
>
On Sun, 2 Jul 2023 00:40:09 +0300 (EEST), you wrote:
>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Signed-off-by: John Cox
>> ---
>> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20
>> libavfilter/aarch64/vf_bwdif_neon.S | 104
>
On Sun, 2 Jul 2023 00:44:10 +0300 (EEST), you wrote:
>On Thu, 29 Jun 2023, John Cox wrote:
>
>> Signed-off-by: John Cox
>> ---
>> libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
>> libavfilter/aarch64/vf_bwdif_neon.S | 215
>
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
Differences from v1:
.align 16 corrected to .balign 16
SXTW tolower
Mac ABI (hopefully) fixed
V register pop/push macroed & prettified
John Cox (15):
avfilter/vf_bwdif: Add out
Outline but no actual functions.
Signed-off-by: John Cox
---
libavfilter/aarch64/Makefile| 2 ++
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 39 +
libavfilter/aarch64/vf_bwdif_neon.S | 25 +
libavfilter/bwdif.h
Add macros for dual scalar half->single multiply and accumulate
Add macro for shift, saturate and shorten single to byte
Add filter constants
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 53 +
1 file changed, 53 insertions(+)
diff --gi
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 3 +++
libavfilter/vf_bwdif.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index 6a0f70487a..ae6f6ce223 100644
--- a/libavfilter
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 17 +++
libavfilter/aarch64/vf_bwdif_neon.S | 53 +
2 files changed, 70 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20
libavfilter/aarch64/vf_bwdif_neon.S | 104
2 files changed, 124 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 54 +++
1 file changed, 54 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 034bbabb4c..5fdba09fdc 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 37 +
1 file changed, 37 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 46224bb575..034bbabb4c 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_neon.S | 73 +
1 file changed, 73 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_neon.S
b/libavfilter/aarch64/vf_bwdif_neon.S
index 6a614f8d6e..48dc7bcd9d 100644
--- a/libavfilter/aarch64
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 5 +
libavfilter/vf_bwdif.c | 10 +-
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index ae1616d366..cce99953f3 100644
--- a
Needed for tail fixup of neon code
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 4
libavfilter/vf_bwdif.c | 8
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index ae6f6ce223..ae1616d366 100644
--- a/libavfilter
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
libavfilter/aarch64/vf_bwdif_neon.S | 208
2 files changed, 229 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
% better than two filter_lines
and a memcpy.
Signed-off-by: John Cox
---
libavfilter/bwdif.h| 7 +++
libavfilter/vf_bwdif.c | 31 +++
2 files changed, 38 insertions(+)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index cce99953f3..496cec72ef 100644
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++
libavfilter/aarch64/vf_bwdif_neon.S | 272
2 files changed, 300 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 81 +++
1 file changed, 81 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 5fdba09fdc..3399cacdf7 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
create any noticable thread load variation.
Signed-off-by: John Cox
---
libavfilter/vf_bwdif.c | 13 ++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/libavfilter/vf_bwdif.c b/libavfilter/vf_bwdif.c
index 52bc676cf8..6701208efe 100644
--- a/libavfilter/vf_bwdif.c
+++ b
On Mon, 3 Jul 2023 00:12:46 +0300 (EEST), you wrote:
>On Sun, 2 Jul 2023, Thomas Mundt wrote:
>
>> Am So., 2. Juli 2023 um 14:34 Uhr schrieb John Cox :
>> Add an optional filter_line3 to the available optimisations.
>>
>> filter_line3 is equivalent to fi
On Mon, 3 Jul 2023 00:02:27 +0300 (EEST), you wrote:
>On Sun, 2 Jul 2023, Martin Storsjö wrote:
>
>> On Sun, 2 Jul 2023, John Cox wrote:
>>
>>> On Sun, 2 Jul 2023 00:35:14 +0300 (EEST), you wrote:
>>>
>>>> On Thu, 29 Jun 2023, John Cox wrote:
>
On Mon, 3 Jul 2023 00:09:52 +0300 (EEST), you wrote:
>On Sun, 2 Jul 2023, John Cox wrote:
>
>> Also adds a filter_line3 method which on aarch64 neon yields approx 30%
>> speedup over 2xfilter_line and a memcpy
>>
>> Differences from v1:
>> .align 16 corrected
Also adds a filter_line3 method which on aarch64 neon yields approx 30%
speedup over 2xfilter_line and a memcpy
Differences from v2:
coeffs moved into const segment
number of patches reduced
John Cox (7):
tests/checkasm: Add test for vf_bwdif filter_intra
avfilter/vf_bwdif: Add neon for
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 37 +
1 file changed, 37 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 46224bb575..034bbabb4c 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Adds an outline for aarch neon functions
Adds common macros and consts for aarch64 neon
Exports C filter_intra needed for tail fixup of neon code
Adds neon for filter_intra
Signed-off-by: John Cox
---
libavfilter/aarch64/Makefile| 2 +
libavfilter/aarch64/vf_bwdif_init_aarch64
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 54 +++
1 file changed, 54 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 034bbabb4c..5fdba09fdc 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Adds clip and spatial macros for aarch64 neon
Exports C filter_edge needed for tail fixup of neon code
Adds neon for filter_edge
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++
libavfilter/aarch64/vf_bwdif_neon.S | 177
libavfilter
may do up to 3 extra lines but filter_edge is faster than filter_line
so it is unlikely to create any noticable thread load variation.
Signed-off-by: John Cox
---
libavfilter/bwdif.h | 7
libavfilter/vf_bwdif.c| 44 +++--
tests/checkasm/vf_bwdif.c | 81
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 21 ++
libavfilter/aarch64/vf_bwdif_neon.S | 208
libavfilter/bwdif.h | 5 +
libavfilter/vf_bwdif.c | 10 +-
4 files changed, 239 insertions
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 28 ++
libavfilter/aarch64/vf_bwdif_neon.S | 272
2 files changed, 300 insertions(+)
diff --git a/libavfilter/aarch64/vf_bwdif_init_aarch64.c
b/libavfilter/aarch64/vf_bwdif_init_aarch64.c
On Mon, 3 Jul 2023 00:14:16 +0300 (EEST), you wrote:
>[snip]
>It's a bit of a shame that this only tests things for 8 bit, not 10, but I
>guess that's better than nothing. The way the current code is set up to
>template both variants of the tests isn't very neat either...
Is there actually >8-b
I've
applied all the requested changes and I didn't want this mistake in the
final patchset. (The mistake was benign - it just wasted a few cycles.)
John Cox (7):
tests/checkasm: Add test for vf_bwdif filter_intra
avfilter/vf_bwdif: Add neon for filter_intra
tests/checkasm: Add test fo
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 37 +
1 file changed, 37 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 46224bb575..034bbabb4c 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Adds an outline for aarch neon functions
Adds common macros and consts for aarch64 neon
Exports C filter_intra needed for tail fixup of neon code
Adds neon for filter_intra
Signed-off-by: John Cox
---
libavfilter/aarch64/Makefile| 2 +
libavfilter/aarch64/vf_bwdif_init_aarch64
Signed-off-by: John Cox
---
tests/checkasm/vf_bwdif.c | 54 +++
1 file changed, 54 insertions(+)
diff --git a/tests/checkasm/vf_bwdif.c b/tests/checkasm/vf_bwdif.c
index 034bbabb4c..5fdba09fdc 100644
--- a/tests/checkasm/vf_bwdif.c
+++ b/tests/checkasm
Adds clip and spatial macros for aarch64 neon
Exports C filter_edge needed for tail fixup of neon code
Adds neon for filter_edge
Signed-off-by: John Cox
---
libavfilter/aarch64/vf_bwdif_init_aarch64.c | 20 +++
libavfilter/aarch64/vf_bwdif_neon.S | 177
libavfilter
1 - 100 of 120 matches
Mail list logo