From: Nelson Gomez
256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for some instructions
like 16-to-32 bit vector sign extension.
Output is bit-for-bit identical to C.
Signed-off-by: Nelson Gomez
---
libswscale/x86
From: Nelson Gomez
Extracting information from SwsContext in assembly is difficult, and
rearranging SwsContext just for asm access didn't look good. These
functions only need a couple of fields from it anyway, so just make
them parameters in their own right.
Signed-off-by: Nelson
From: Nelson Gomez
Signed-off-by: Nelson Gomez
---
libswscale/output.c | 13 -
libswscale/swscale_internal.h | 3 ++-
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/libswscale/output.c b/libswscale/output.c
index 2e5d6076ab..bddfaf16af 100644
--- a
:32.29 bitrate=N/A
speed=15.8x
bench: utime=48.625s stime=0.459s rtime=21.058s
bench: maxrss=78500kB
---
Nelson Gomez (3):
swscale: make yuv2interleavedX more asm-friendly
swscale/x86/output: add AVX2 version of yuv2nv12cX
swscale: cosmetic fixes
libswscale/output.c
From: Nelson Gomez
256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for a couple of
instructions like cross-lane permutation.
Output is bit-for-bit identical to C.
Signed-off-by: Nelson Gomez
---
libswscale/x86
---
Nelson Gomez (3):
swscale: make yuv2interleavedX more asm-friendly
swscale/x86/output: add AVX2 version of yuv2nv12cX
swscale: cosmetic fixes
libswscale/output.c | 25 +++---
libswscale/swscale_internal.h |8 +-
libswscale/vscale.c |2
libswscale/x86
From: Nelson Gomez
Extracting information from SwsContext in assembly is difficult, and
rearranging SwsContext just for asm access didn't look good. These
functions only need a couple of fields from it anyway, so just make
them parameters in their own right.
Signed-off-by: Nelson
From: Nelson Gomez
Signed-off-by: Nelson Gomez
---
libswscale/output.c | 13 -
libswscale/swscale_internal.h | 3 ++-
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/libswscale/output.c b/libswscale/output.c
index 2e5d6076ab..bddfaf16af 100644
--- a
From: Nelson Gomez
Signed-off-by: Nelson Gomez
---
libswscale/output.c | 13 -
libswscale/swscale_internal.h | 3 ++-
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/libswscale/output.c b/libswscale/output.c
index 2e5d6076ab..bddfaf16af 100644
--- a
From: Nelson Gomez
v3:
- Fixed x86_32 compilation
v2: [2]
- Addressing comments James left on iter. 1
- Cleaned up how dither gets read to avoid using stack space
v1: [1]
[1] http://ffmpeg.org/pipermail/ffmpeg-devel/2020-April/261313.html
[2] http://ffmpeg.org/pipermail/ffmpeg-devel
From: Nelson Gomez
256 bits is just wide enough to fit all the operands needed to vectorize
the software implementation, but AVX2 is needed to for a couple of
instructions like cross-lane permutation.
Output is bit-for-bit identical to C.
Signed-off-by: Nelson Gomez
---
libswscale/x86
From: Nelson Gomez
Extracting information from SwsContext in assembly is difficult, and
rearranging SwsContext just for asm access didn't look good. These
functions only need a couple of fields from it anyway, so just make
them parameters in their own right.
Signed-off-by: Nelson
Bumping this patchset (and apologies if Outlook mangles the threading)
-Original Message-
From: ffmpeg-devel On Behalf Of Nelson Gomez
Sent: Saturday, April 25, 2020 7:37 PM
To: ffmpeg-devel@ffmpeg.org
Subject: [FFmpeg-devel] [PATCH v3 0/3] swscale: add AVX2 version of yuv2nv12cX
From
13 matches
Mail list logo