Signed-off-by: Dmitriy Kovalenko
---
libswscale/aarch64/input.S | 166 +
1 file changed, 112 insertions(+), 54 deletions(-)
diff --git a/libswscale/aarch64/input.S b/libswscale/aarch64/input.S
index c1c0adffc8..ee8eb24c14 100644
--- a/libswscale/aarch64/input.S
+
I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
subsampled conversion. In this patch stack I'll try to
improve the perofrmance.
This particular set of changes is a small improvement to all the
existing functions and macro. The biggest performance gain is
coming from post loadi
This patches integrates so called double bufferring when we are loading
2 batch elements at a time and then processing them in parallel. On the
moden arm processors especially Apple Silicon it gives a visible
benefit, for subsampled pixel processing it is especially nice because
it allows to read e
Bumping on the review for this one
On 19/05/2025 21:50, Dmitriy Kovalenko wrote:
I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
subsampled conversion. In this patch stack I'll try to
improve the performance.
This particular set of changes is a small imp
Great. I send another version with the reverted change for the asr register
change. What is the correct process to reply for the inline changes then?
Inline email answer or cover letter?
> On May 30, 2025, at 11:10, Martin Storsjö wrote:
>
> On Fri, 30 May 2025, Dmitriy Kovale
Correct. I meant dual issue
https://developer.arm.com/documentation/ddi0460/d/Cycle-Timings-and-Interlock-Behavior/Dual-issue
Best regards,
Dmitriy Kovalenko
On May 31, 2025, at 12:32, Kieran Kunhya wrote:
On Sat, 31 May 2025, 10:17 Dmitriy Kovalenko,
mailto:dmtr.kovale...@outlook.com
the quoted message.
You can still use “>” to make a partial quote (hope it works lol)
Best regards,
Dmitriy Kovalenko
> On May 31, 2025, at 12:43, Christopher Snowhill wrote:
>
> by
> not allowing one to insert text into the middle of
> On May 31, 2025, at 14:13, Martin Storsjö wrote:
>
> On Sat, 31 May 2025, Dmitriy Kovalenko wrote:
>
>> Correct. I meant dual issue
>> https://developer.arm.com/documentation/ddi0460/d/Cycle-Timings-and-Interlock-Behavior/Dual-issue
>
> D
ich does detect such issues.
I managed to rewrite the function to avoid using any callee saved
registers. The only register I keep using is v7 which is not callee saved.
Dmitriy Kovalenko (2):
swscale: rgb_to_yuv neon optimizations
swscale: Neon rgb_to_yuv_half process 32 pixe
I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
subsampled conversion. In this patch stack I'll try to
improve the perofrmance.
This particular set of changes is a small improvement to all the
existing functions and macro. The biggest performance gain is
coming from post loadi
This patch integrates so called double bufferring when we are loading
2 batch of elements at a time and then processing them in parallel. On the
moden arm processors especially Apple Silicon it gives a visible
benefit, for subsampled pixel processing it is especially nice because
it allows to read
I appreciate the review for both the commits. I did fix all the unrelated
changes and iterated in the new version, would appreciate the rearview.
> On May 29, 2025, at 20:53, Martin Storsjö wrote:
>
> On Tue, 27 May 2025, Dmitriy Kovalenko wrote:
>
>> This particular s
I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
subsampled conversion. In this patch stack I'll try to
improve the perofrmance.
This particular set of changes is a small improvement to all the
existing functions and macro. The biggest performance gain is
coming from post loadi
This patch integrates so called double bufferring when we are loading
2 batch of elements at a time and then processing them in parallel. On the
moden arm processors especially Apple Silicon it gives a visible
benefit, for subsampled pixel processing it is especially nice because
it allows to read
, Martin Storsjö wrote:
>
> On Thu, 29 May 2025, Dmitriy Kovalenko wrote:
>
>> I appreciate the review for both the commits. I did fix all the unrelated
>> changes and iterated in the new version, would appreciate the rearview.
>
> Don't top post.
>
> There a
=== Feedback response ===
> Also, with that fixed, this fails to properly back up and restore registers
> v8-v15; checkasm doesn't notice this on macOS, but on Linux and windows,
> checkasm has a call wrapper which does detect such issues.
I managed to rewrite the function to avoid using any ca
I'm sorry for the previous patch it seems to be something happening off with
the corrupted
patch got sent at the outlook step, I'll keep using send-email.
=== __every single__ inline comment response ===
> This is an unrelated change
Fixed and resolved
> The patch adds trailing whitespace here
macos nor linux arm builds so why not to keep them?
Dmitriy Kovalenko (2):
swscale: rgb_to_yuv neon optimizations
swscale: Neon rgb_to_yuv_half process 32 pixels at a time
libswscale/aarch64/input.S | 212 +++--
1 file changed, 155 insertions(+), 57 deletions
I've found quite a few ways to optimize existing ffmpeg's rgb to yuv
subsampled conversion. In this patch stack I'll try to
improve the perofrmance.
This particular set of changes is a small improvement to all the
existing functions and macro. The biggest performance gain is
coming from post loadi
This patch integrates so called double bufferring when we are loading
2 batch of elements at a time and then processing them in parallel. On the
moden arm processors especially Apple Silicon it gives a visible
benefit, for subsampled pixel processing it is especially nice because
it allows to read
Some of the versions of Apple Clang produces a ton of the warnings
related to the missing nullablity specifiers on the existing codebase of
ffmpeg which significantly slows down the compilation becuase of the
produced output size (especially on CI as a part of external build systems
because they us
21 matches
Mail list logo