[FFmpeg-devel] [PATCH 2/3] swscale/x86/output: add AVX2 version of yuv2nv12cX

2020-04-23 Thread Nelson Gomez
From: Nelson Gomez 256 bits is just wide enough to fit all the operands needed to vectorize the software implementation, but AVX2 is needed to for some instructions like 16-to-32 bit vector sign extension. Output is bit-for-bit identical to C. Signed-off-by: Nelson Gomez --- libswscale/x86

[FFmpeg-devel] [PATCH 1/3] swscale: make yuv2interleavedX more asm-friendly

2020-04-23 Thread Nelson Gomez
From: Nelson Gomez Extracting information from SwsContext in assembly is difficult, and rearranging SwsContext just for asm access didn't look good. These functions only need a couple of fields from it anyway, so just make them parameters in their own right. Signed-off-by: Nelson

[FFmpeg-devel] [PATCH 3/3] swscale: cosmetic fixes

2020-04-23 Thread Nelson Gomez
From: Nelson Gomez Signed-off-by: Nelson Gomez --- libswscale/output.c | 13 - libswscale/swscale_internal.h | 3 ++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 2e5d6076ab..bddfaf16af 100644 --- a

[FFmpeg-devel] [PATCH 0/3] swscale: add AVX2 version of yuv2nv12cX

2020-04-23 Thread Nelson Gomez
:32.29 bitrate=N/A speed=15.8x bench: utime=48.625s stime=0.459s rtime=21.058s bench: maxrss=78500kB --- Nelson Gomez (3): swscale: make yuv2interleavedX more asm-friendly swscale/x86/output: add AVX2 version of yuv2nv12cX swscale: cosmetic fixes libswscale/output.c

[FFmpeg-devel] [PATCH v2 2/3] swscale/x86/output: add AVX2 version of yuv2nv12cX

2020-04-24 Thread Nelson Gomez
From: Nelson Gomez 256 bits is just wide enough to fit all the operands needed to vectorize the software implementation, but AVX2 is needed to for a couple of instructions like cross-lane permutation. Output is bit-for-bit identical to C. Signed-off-by: Nelson Gomez --- libswscale/x86

[FFmpeg-devel] [PATCH v2 0/3] swscale: add AVX2 version of yuv2nv12cX

2020-04-24 Thread Nelson Gomez
--- Nelson Gomez (3): swscale: make yuv2interleavedX more asm-friendly swscale/x86/output: add AVX2 version of yuv2nv12cX swscale: cosmetic fixes libswscale/output.c | 25 +++--- libswscale/swscale_internal.h |8 +- libswscale/vscale.c |2 libswscale/x86

[FFmpeg-devel] [PATCH v2 1/3] swscale: make yuv2interleavedX more asm-friendly

2020-04-24 Thread Nelson Gomez
From: Nelson Gomez Extracting information from SwsContext in assembly is difficult, and rearranging SwsContext just for asm access didn't look good. These functions only need a couple of fields from it anyway, so just make them parameters in their own right. Signed-off-by: Nelson

[FFmpeg-devel] [PATCH v2 3/3] swscale: cosmetic fixes

2020-04-24 Thread Nelson Gomez
From: Nelson Gomez Signed-off-by: Nelson Gomez --- libswscale/output.c | 13 - libswscale/swscale_internal.h | 3 ++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 2e5d6076ab..bddfaf16af 100644 --- a

[FFmpeg-devel] [PATCH v3 3/3] swscale: cosmetic fixes

2020-04-25 Thread Nelson Gomez
From: Nelson Gomez Signed-off-by: Nelson Gomez --- libswscale/output.c | 13 - libswscale/swscale_internal.h | 3 ++- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/libswscale/output.c b/libswscale/output.c index 2e5d6076ab..bddfaf16af 100644 --- a

[FFmpeg-devel] [PATCH v3 0/3] swscale: add AVX2 version of yuv2nv12cX

2020-04-25 Thread Nelson Gomez
From: Nelson Gomez v3: - Fixed x86_32 compilation v2: [2] - Addressing comments James left on iter. 1 - Cleaned up how dither gets read to avoid using stack space v1: [1] [1] http://ffmpeg.org/pipermail/ffmpeg-devel/2020-April/261313.html [2] http://ffmpeg.org/pipermail/ffmpeg-devel

[FFmpeg-devel] [PATCH v3 2/3] swscale/x86/output: add AVX2 version of yuv2nv12cX

2020-04-25 Thread Nelson Gomez
From: Nelson Gomez 256 bits is just wide enough to fit all the operands needed to vectorize the software implementation, but AVX2 is needed to for a couple of instructions like cross-lane permutation. Output is bit-for-bit identical to C. Signed-off-by: Nelson Gomez --- libswscale/x86

[FFmpeg-devel] [PATCH v3 1/3] swscale: make yuv2interleavedX more asm-friendly

2020-04-25 Thread Nelson Gomez
From: Nelson Gomez Extracting information from SwsContext in assembly is difficult, and rearranging SwsContext just for asm access didn't look good. These functions only need a couple of fields from it anyway, so just make them parameters in their own right. Signed-off-by: Nelson

Re: [FFmpeg-devel] [PATCH v3 0/3] swscale: add AVX2 version of yuv2nv12cX

2020-06-03 Thread Nelson Gomez
Bumping this patchset (and apologies if Outlook mangles the threading) -Original Message- From: ffmpeg-devel On Behalf Of Nelson Gomez Sent: Saturday, April 25, 2020 7:37 PM To: ffmpeg-devel@ffmpeg.org Subject: [FFmpeg-devel] [PATCH v3 0/3] swscale: add AVX2 version of yuv2nv12cX From