On 3/11/23 17:14, Thomas Mundt wrote:
+%if mmsize == 32
+vpbroadcastd m12, DWORD clip_maxm
I get a green pattern at bit depths > 8.
Looks good with:
vpbroadcastw m12, WORD clip_maxm
+%else
movdm12, DWORD clip_maxm
SPLATW m12, m12, 0
+%endif
Of
On 3/11/23 17:18, Thomas Mundt wrote:
I'm not familiar with checkasm tests, but isn't this one limited to a bit
depth of 8?
Yes, that was the idea because I was only intending to modify the 8-bit
function, for now. The function pointer is the same for all depths so
you need to initializ
On 2/20/23 14:06, James Darnley wrote:
On 2/20/23 13:49, Nicolas George wrote:
James Darnley (12023-02-20):
snip
Moving scale before yadif is right, but format= is redundant with
-pix_fmt.
Regards,
So the patch should just be moving the scale filter first? Sure. Any
other comments
On 2/24/23 04:00, Felix LeClair wrote:
Fixes: Compilation of Sobel with AVX512ICL
Caused: Comment left without deleniator in AVX512ICL version of SOBEL
Testing:Confirmed working on AVX512 Alderlake (AKA SPR without AMX)
diff --git a/libavfilter/x86/vf_convolution.asm
b/libavfilter/x86/vf_co
2.24x faster (1925±1.3 vs. 859±2.2 decicycles) compared with ssse3
---
libavfilter/x86/vf_bwdif.asm| 29 -
libavfilter/x86/vf_bwdif_init.c | 12
2 files changed, 36 insertions(+), 5 deletions(-)
diff --git a/libavfilter/x86/vf_bwdif.asm b/libavfilter/x
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
tests/checkasm/checkasm.h | 1 +
tests/checkasm/vf_bwdif.c | 70 +++
tests/fate/checkasm.mak | 1 +
5 files changed, 76 insertions(+)
create mode 100644 tests/checkasm/vf_bwdif.c
diff
---
libavfilter/bwdif.h | 3 ++-
libavfilter/vf_bwdif.c | 13 +
libavfilter/x86/vf_bwdif_init.c | 4 +---
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/libavfilter/bwdif.h b/libavfilter/bwdif.h
index 889ff772ed..5749345f78 100644
--- a/libavfilt
On 2/20/23 13:49, Nicolas George wrote:
James Darnley (12023-02-20):
-fate-filter-yadif10: CMD = framecrc -flags bitexact -idct simple -i
$(TARGET_SAMPLES)/mpeg2/mpeg2_field_encoding.ts -flags bitexact -pix_fmt
yuv420p10le -frames:v 30 -vf yadif=0,scale
-fate-filter-yadif16: CMD = framecrc
On 2/10/23 14:06, James Darnley wrote:
snip
This patch set is broken. The checkasm test is incomplete. This avx2
function has some bug that only manifests when the strides (prefs mrefs)
are opposite signs (one positive and one negative). That situation is
what happens with real usage. I
---
tests/fate/filter-video.mak | 4 +--
tests/ref/fate/filter-yadif10 | 60 +--
tests/ref/fate/filter-yadif16 | 60 +--
3 files changed, 62 insertions(+), 62 deletions(-)
diff --git a/tests/fate/filter-video.mak b/tests/fate/filt
---
libavfilter/vf_yadif.c | 13 +
libavfilter/x86/vf_yadif_init.c | 4 +---
libavfilter/yadif.h | 3 ++-
3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/libavfilter/vf_yadif.c b/libavfilter/vf_yadif.c
index afa4d1d53d..1f9434f961 100644
--- a/lib
Zen 2 (Ryzen 7 3700X):
1.73x faster (3603±586.3 vs. 2082±317.1 decicycles) compared with ssse3
Using an SD y4m file speed increases from ~ 3600 fps to ~4700.
---
libavfilter/x86/vf_yadif.asm| 83 +++--
libavfilter/x86/vf_yadif_init.c | 4 ++
2 files changed, 62 in
---
tests/checkasm/Makefile | 1 +
tests/checkasm/checkasm.c | 3 ++
tests/checkasm/checkasm.h | 1 +
tests/checkasm/vf_yadif.c | 62 +++
4 files changed, 67 insertions(+)
create mode 100644 tests/checkasm/vf_yadif.c
diff --git a/tests/checkasm/Makefile b
Ice Lake (Xeon Silver 4316): 2.01x faster (1147±36.8 vs. 571±38.2 decicycles)
compared with avx2
---
I think I can merge this with the existing macro without it being too ugly.
That might allow a plain avx512 version too but I can't say if that would be any
faster.
libavcodec/x86/v210-init.c |
---
libavcodec/x86/v210.asm | 12 ++--
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/libavcodec/x86/v210.asm b/libavcodec/x86/v210.asm
index 3b9e0761df..600a4ddc5f 100644
--- a/libavcodec/x86/v210.asm
+++ b/libavcodec/x86/v210.asm
@@ -65,18 +65,18 @@ cglobal v210_planar_unp
On 12/7/22 17:08, James Darnley wrote:
---
configure | 5 +
1 file changed, 5 insertions(+)
diff --git a/configure b/configure
index f4eedfc207..eaa5ef6b20 100755
--- a/configure
+++ b/configure
@@ -4315,6 +4315,11 @@ case "$toolchain" in
add_cflags -fsaniti
---
configure | 5 +
1 file changed, 5 insertions(+)
diff --git a/configure b/configure
index f4eedfc207..eaa5ef6b20 100755
--- a/configure
+++ b/configure
@@ -4315,6 +4315,11 @@ case "$toolchain" in
add_cflags -fsanitize=address
add_ldflags -fsanitize=address
;;
+
---
libavcodec/x86/v210enc.asm | 1 -
1 file changed, 1 deletion(-)
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index d3639cd440..daf5f2ab81 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -331,7 +331,6 @@ cglobal v210_planar_pack_8, 5, 5, 7+no
---
libavcodec/x86/v210enc.asm | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index 552164a8be..d3639cd440 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -314,7 +314,7 @@ cglobal v210_
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2
avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
---
libavcodec/x86/v210enc.asm| 99 +++
libavcod
---
libavcodec/x86/v210enc.asm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index afac238ede..c2ad3d72c0 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -62,7 +62,7 @@ SECTION .text
; v210
---
tests/checkasm/v210enc.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c
index 9942e08137..9fb8321c25 100644
--- a/tests/checkasm/v210enc.c
+++ b/tests/checkasm/v210enc.c
@@ -72,8 +72,10 @@
randomize_buf
ARCH_X86_64 is always defined. So checks of this type need to check with #if.
Thanks. I forgot the ffmpeg convention there.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit l
avx512 on Skylake-X (Xeon D-2123IT):
1.19x faster (970±91.2 vs. 817±104.4 decicycles) compared with avx2
avx512icl on Ice Lake (Xeon Silver 4316):
2.52x faster (1350±5.3 vs. 535±9.5 decicycles) compared with avx2
---
libavcodec/x86/v210enc.asm| 99 +++
libavcod
---
libavcodec/x86/v210enc.asm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/x86/v210enc.asm b/libavcodec/x86/v210enc.asm
index afac238ede..c2ad3d72c0 100644
--- a/libavcodec/x86/v210enc.asm
+++ b/libavcodec/x86/v210enc.asm
@@ -62,7 +62,7 @@ SECTION .text
; v210
---
tests/checkasm/v210enc.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tests/checkasm/v210enc.c b/tests/checkasm/v210enc.c
index 9942e08137..9fb8321c25 100644
--- a/tests/checkasm/v210enc.c
+++ b/tests/checkasm/v210enc.c
@@ -72,8 +72,10 @@
randomize_buf
+%else
+pand m1, m6, m1
+pandn m0, m6, m0
+porm0, m0, m1
+%endif
Isn't that pattern a vpblendb or some such ?
I think Kieran already responded to this on IRC but I will too.
Unfortunately not. This blend is at the bit lev
Negligible speed difference for avx2 on Zen 2 (Ryzen 5700X) and
Broadwell (Xeon E5-2620 v4):
1690±4.3 decicycles vs. 1693±78.4
1439±31.1 decicycles vs 1429±16.7
Moderate speedup with avx512 on Skylake-X (Xeon D-2123IT):
1.22x faster (793±0.8 vs. 649±5.5 decicycles) compared with avx2
Bett
---
tests/checkasm/checkasm.c | 1 +
tests/checkasm/checkasm.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c
index 421bd096c5..c3d77cb6af 100644
--- a/tests/checkasm/checkasm.c
+++ b/tests/checkasm/checkasm.c
@@ -918,5 +918,6 @@ int che
---
libavutil/tests/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/libavutil/tests/cpu.c b/libavutil/tests/cpu.c
index 5bec742b2b..dadadb31dc 100644
--- a/libavutil/tests/cpu.c
+++ b/libavutil/tests/cpu.c
@@ -77,6 +77,7 @@ static const struct {
{ AV_CPU_FLAG_BMI2, "bmi2"
---
.mailmap | 1 -
1 file changed, 1 deletion(-)
diff --git a/.mailmap b/.mailmap
index ba072f38c8..af60290f77 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1,4 +1,3 @@
-
--
2.38.0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ff
I guess it could also be scaled to ymm if you're a big Skylake fan :P
(in which case you'd probably want to reorder the shuffle indices so
While cherry-picking some stuff for avx512 I have noticed that ffmpeg
has a discrepancy in the comments for the two avx512 flags.
Lets start with the public header
libavutil/cpu.h
56│ #define AV_CPU_FLAG_AVX512 0x10 ///< AVX-512 functions: requires
OS support even if YMM/ZMM register
---
doc/filters.texi | 5 +
libavfilter/vf_subtitles.c | 23 ---
2 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/doc/filters.texi b/doc/filters.texi
index a161754233..cfbc807f16 100644
--- a/doc/filters.texi
+++ b/doc/filters.texi
@@ -21160,6 +211
On 2020-06-04 01:19, Michael Niedermayer wrote:
> Fixes: array end overread
> Fixes:
> 22395/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BITPACKED_fuzzer-5760940300828672
>
> Found-by: continuous fuzzing process
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-
On 2020-04-10 16:53, Anton Khirnov wrote:
> ffmpeg | branch: master | Anton Khirnov | Mon Jan 9
> 18:04:42 2017 +0100| [1f4cf92cfbd3accbae582ac63126ed5570ddfd37] | committer:
> Anton Khirnov
>
> pthread_frame: merge the functionality for normal decoder init and
> init_thread_copy
>
> The cur
On 2020-02-23 18:58, Michael Niedermayer wrote:
> On Sun, Feb 23, 2020 at 05:03:36PM +0100, Carl Eugen Hoyos wrote:
>> Am So., 23. Feb. 2020 um 13:30 Uhr schrieb Michael Niedermayer
>> :
>>>
>>> From: Parker Ernest <@>
>>>
>>> commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on
>>> x86_
On 2020-02-23 15:12, Jean-Baptiste Kempf wrote:
> Yo,
>
> On Sat, Feb 22, 2020, at 22:18, Josh de Kock wrote:
>> This allows for easy shortlog/log parsing, useful in determining
>> eligible members of the general assembly for the new FFmpeg voting
>> system.
>
> I think this is a good idea.
> But
On 2020-02-23 13:22, Michael Niedermayer wrote:
> From: Parker Ernest <@>
>
> commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on
> x86_64 CPUs which do not have SSSE3, e.g. AMD Phenom-II
>
> Signed-off-by: Michael Niedermayer
> ---
> libswscale/x86/yuv2rgb.c | 2 ++
> 1 file change
On 2020-02-22 13:25, Paul B Mahol wrote:
> On 2/22/20, James Darnley wrote:
>> On 2020-02-22 11:11, Thilo Borgmann wrote:
>>> Please someone put an IRC log from the meeting there, too. James Darnley?
>>> Also the audio was streamed, somebody might remember where too ex
On 2020-02-22 11:11, Thilo Borgmann wrote:
> Please someone put an IRC log from the meeting there, too. James Darnley?
> Also the audio was streamed, somebody might remember where too exactly.
> Michael?
I can post my log from the day, probably email attachment. Should I
remove any of
On 30/12/2019, Lauri Kasanen wrote:
> Hi,
>
> For the Libre RISC-V project, I'm going to research the popular codecs
> and design new instructions to help speed them up. With ffmpeg being
> home to lots of asm folks for many platforms, I also want to ask your
> opinion.
>
> What new instructions w
On 28/01/2020, Liu Steven wrote:
>
>
>> 在 2020年1月27日,下午3:29,Jean-Baptiste Kempf 写道:
>> It will be joinable through some VideoConf tool.
> Can we join by IRC or other things on internet?
> Because these days are Spring Festival (Chinese New Year, Important
> festivals that have lasted for thousand
On 2019-12-04 15:43, Linjie Fu wrote:
> Previously, media driver provided planar format(like 420 8 bit),
> but for HEVC Range Extension (422/444 8/10 bit), the decoded image
> is produced in packed format because Windows expects it.
>
> Add some packed pixel formats for hardware decode support in
On 2019-11-25 13:52, Chandra Nakka wrote:
> Dear FFmpeg developers,
>
> I'm very happy to have found your details on FFmpeg website for requesting
> FFmpeg feature implementation.
>
> Currently I'm using FFmpeg command line tool on my linux servers to process
> media files into instant mp3 audio
On 2019-10-11 21:45, Paul B Mahol wrote:
> diff --git a/doc/utils.texi b/doc/utils.texi
> index d55dd315c3..4e2e713505 100644
> --- a/doc/utils.texi
> +++ b/doc/utils.texi
> @@ -920,6 +920,9 @@ corresponding input value will be returned.
> @item round(expr)
> Round the value of expression @var{e
From: Kieran Kunhya
---
libavcodec/h264_slice.c | 29 +++--
1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index 5ceee107a0..fe2aa01ceb 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@
---
libavcodec/h264dec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c
index 8d1bd16a8e..b9f304936c 100644
--- a/libavcodec/h264dec.c
+++ b/libavcodec/h264dec.c
@@ -1056,7 +1056,7 @@ AVCodec ff_h264_decoder = {
.init
-frames in chunked mode.
Needs more work.
James Darnley (1):
avcodec/h264: enable draw_horiz_band
Kieran Kunhya (1):
avcodec/h264: fix draw_horiz_band with slice threads
libavcodec/h264_slice.c | 29 +++--
libavcodec/h264dec.c| 2 +-
2 files changed, 24 insert
From: Henrik Gramner
Use register numbers instead of copying the full register names. This makes it
possible to change register widths in the middle of a function and keep the
mmreg permutations intact which can be useful for code that only needs larger
vectors for parts of the function in combin
From: Henrik Gramner
---
libavutil/x86/x86inc.asm | 30 +-
1 file changed, 17 insertions(+), 13 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index d1b4c982fc..8c8cc97e0c 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc
From: Henrik Gramner
Most VEX-encoded instructions require an additional byte to encode when src2
is a high register (e.g. x|ymm8..15). If the instruction is commutative we
can swap src1 and src2 when doing so reduces the instruction length, e.g.
vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8,
Here are a few easy-to-import patches from x264. These are all after x264
commit 4a158b00 "x86inc: Correctly set mmreg variables" which FFmpeg already
has (commit eb5f063e7c).
It does not include the following commits:
* 82721eae "x86inc: Add x86-32 PIC support macros"
* 101bd27d "x86inc: Support
From: Henrik Gramner
Warn when the following are used without the appropriate cpuflag:
* YMM and ZMM registers
* 'pextrw' with a memory operand
* GPR instruction set extensions
---
libavutil/x86/x86inc.asm | 120 +++
1 file changed, 83 insertions(+), 37 del
From: Henrik Gramner
---
libavutil/x86/x86inc.asm | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 04dbb6b785..af35fe1e4d 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -685,7 +685,7 @@ DECLARE_
From: Henrik Gramner
---
libavutil/x86/x86inc.asm | 4
1 file changed, 4 insertions(+)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 10b7711637..04dbb6b785 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -293,6 +293,10 @@ DECLARE_REG_TMP_SIZ
From: Henrik Gramner
There's an edge case that wasn't properly handled.
---
libavutil/x86/x86inc.asm | 5 +
1 file changed, 5 insertions(+)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 5044ee86f0..bc370a6186 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86
On 2019-08-02 15:55, Ramana Jajula wrote:
> Hi,
>
> I am trying to encode my ts file m3u8 using my customised ffmpeg of version
> 4.1. I used below command to do encoding.
>
> ffmpeg -re -threads 8 -i /videos/input.ts -vcodec libx264 -s 320x240 -b:v
> 512000 -maxrate 512000 -acodec libfdk_aac -b:
On 2019-06-28 03:03, Hendrik Leppkes wrote:
> On Fri, Jun 28, 2019 at 1:26 AM James Darnley wrote:
>>
>> On 2019-06-28 04:26, Linjie Fu wrote:
>>> Previously, media driver provided planar format(like 420 8 bit), but
>>> for HEVC Range Extension (422/44
On 2019-06-28 04:26, Linjie Fu wrote:
> Previously, media driver provided planar format(like 420 8 bit), but
> for HEVC Range Extension (422/444 8/10 bit), the decoded image is
> produced in packed format.
>
> Y210/AYUV/Y410 are packed formats which are needed in HEVC Rext decoding
> for both VAAP
On 2019-05-28 22:00, Derek Buitenhuis wrote:
> On 28/05/2019 20:58, James Almer wrote:
>> I think x26* and vpx/aom call it crf? It's not in option_tables.h in any
>> case.
>
> They do not. This is a constant quantizer mode, not constant rate factor.
IIRC either qp or cqp
signature.asc
Descrip
On 2019-05-24 12:06, James Darnley wrote:
> On 2019-05-24 11:36, lance.lmw...@gmail.com wrote:
>> From: Limin Wang
>>
>> ...
>
> Why?
I see why: so you don't screw-up the macros you create later.
signature.asc
Descri
On 2019-05-24 11:36, lance.lmw...@gmail.com wrote:
> From: Limin Wang
>
> ...
Why? And these are "comments" not "commands".
signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.o
On 2019-05-18 12:15, Michael Niedermayer wrote:
> On Sat, May 18, 2019 at 12:02:55PM +0200, James Darnley wrote:
>> I object to the commit message though because it isn't a "null pointer
>> dereference" but if that is the error as reported by the tool then keep
>
On 2019-05-18 09:39, Michael Niedermayer wrote:
> Fixes: "null pointer dereference"
> Fixes:
> 14551/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_V210_fuzzer-5088609952071680
>
> Found-by: continuous fuzzing process
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-o
On 2019-04-10 14:47, James Darnley wrote:
> I am resending this my patches because I am not sure if I sent this version in
> the past. I split my changes into two patches because they do separate
> things.
>
> I also changed some tabs to spaces in Mike's AVX2 patch.
&
On 2019-04-10 14:47, James Darnley wrote:
> From: Michael Stoner
Screw you mailing list or git, which ever one of you managed to screw up
the author's address. I will correct that, if I can.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.o
Prepare for checkasm test.
---
libavcodec/v210dec.c | 16 ++--
libavcodec/v210dec.h | 1 +
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..fd8a6b0d78 100644
--- a/libavcodec/v210dec.c
+++ b/libavcodec/v210de
From: Michael Stoner
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck
AVX2 is 1.4x faster than AVX
---
Mike, is this still the patch you want applied. I had to make a small
amendment to it because you had some tabs as indentation.
libavcodec/v210dec.c | 10 +-
libavcodec/
I am resending this my patches because I am not sure if I sent this version in
the past. I split my changes into two patches because they do separate things.
I also changed some tabs to spaces in Mike's AVX2 patch.
James Darnley (2):
avcodec/v210dec: move DSP function setting into dedi
sm_check_vf_hflip(void);
void checkasm_check_vf_threshold(void);
diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c
new file mode 100644
index 00..7dd50a8271
--- /dev/null
+++ b/tests/checkasm/v210dec.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2019 James Darnley
+ *
+ * This file is par
On 2019-03-26 21:22, Mike Stoner via ffmpeg-devel wrote:
> Hello,
> I’ve accounted for all feedback on this so far, I’m wondering if it is ready
> to be pushed upstream?
>
> Here are my results from ‘checkasm’ (lower is better):
>
> v210_unpack_c: 1636
> v210_unpack_ssse3: 611
> v210_unpack_avx:
Prepare for checkasm test.
---
libavcodec/v210dec.c | 16 ++--
libavcodec/v210dec.h | 1 +
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..fd8a6b0d78 100644
--- a/libavcodec/v210dec.c
+++ b/libavcodec/v210de
sm_check_vf_hflip(void);
void checkasm_check_vf_threshold(void);
diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c
new file mode 100644
index 00..7dd50a8271
--- /dev/null
+++ b/tests/checkasm/v210dec.c
@@ -0,0 +1,77 @@
+/*
+ * Copyright (c) 2019 James Darnley
+ *
+ * This file is par
Prepare for checkasm test.
---
libavcodec/v210dec.c | 16 ++--
libavcodec/v210dec.h | 1 +
2 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..6db662538e 100644
--- a/libavcodec/v210dec.c
+++ b/libavcodec/v210de
On 2019-03-06 20:31, James Darnley wrote:
> ...
Wrong patch and wrong reference. Please ignore this.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
sm_check_vf_hflip(void);
void checkasm_check_vf_threshold(void);
diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c
new file mode 100644
index 00..7320ed5e37
--- /dev/null
+++ b/tests/checkasm/v210dec.c
@@ -0,0 +1,76 @@
+/*
+ * Copyright (c) 2019 James Darnley
+ *
+ * This file is par
On 2019-03-06 10:11, Paul B Mahol wrote:
> On 3/6/19, Carl Eugen Hoyos wrote:
>> 2019-03-04 23:58 GMT+01:00, James Darnley :
>>> Prepare for checkasm test.
>>> ---
>>> libavcodec/v210dec.c | 13 +
>>> libavcodec/v210dec.h | 1 +
>&g
sm_check_vf_hflip(void);
void checkasm_check_vf_threshold(void);
diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c
new file mode 100644
index 00..7320ed5e37
--- /dev/null
+++ b/tests/checkasm/v210dec.c
@@ -0,0 +1,76 @@
+/*
+ * Copyright (c) 2019 James Darnley
+ *
+ * This file is par
Prepare for checkasm test.
---
libavcodec/v210dec.c | 13 +
libavcodec/v210dec.h | 1 +
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c
index ddc5dbe8be..28cf00d320 100644
--- a/libavcodec/v210dec.c
+++ b/libavcodec/v210dec.c
On 2019-03-01 18:41, Michael Stoner wrote:
> The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my
> first patch submission so any comments are greatly appreciated.
>
> -Mike
>
> Tested on Skylake (Win32 & Win64)
> 1920x1080 input frame
> =
> C code - 440
On 2019-03-03 15:44, Martin Vignali wrote:
> Hello,
>
> ...
>
> Not directly related to this patch, but it can be interesting for testing
> purpose to write a checkasm test for the v210 func decoding.
> So it's more easy to check the perf for "each" cpu flags, and be sure, the
> various width cas
On 2019-02-15 10:01, Kornel wrote:
> libavcodec/gif.c in ff_gif_encoder.pix_fmts seems to passively declare types
> of pixel formats it accepts.
If you want to experiment you can change that so it accepts rgb (also or
only). Then you can implement and test what you want, then you can ask
about s
On 2018-09-06 19:39, Sigríður Regína Sigurþórsdóttir wrote:
> +if (s->metadata_header_padding) {
> +if (s->metadata_header_padding == 1)
> +s->metadata_header_padding++;
> +put_ebml_void(pb, s->metadata_header_padding);
> +}
Unfortunately I was forced to make th
On 2018-09-05 22:52, Sigríður Regína Sigurþórsdóttir wrote:
> +{"reserve_free_space", "Reserve a given amount of space at the
> beginning og the file for unspecified purpose."
I added the "metadata_header_padding" global option many years ago. Can
you not reuse it for this purpose? Is it not
On 2018-09-03 15:29, James Almer wrote:
> pass 32 - 1 to both av_image_fill_pointers() calls directly?
Please do not add a magic number where nobody will find it. Use one of
the 3 already existing methods for knowing the alignment necessary for
assembly.
If this is unrelated, my apologies.
On 2018-07-27 15:05, Henrik Gramner wrote:
> On Fri, Jul 27, 2018 at 1:47 PM, James Darnley wrote:
>> On 2018-07-26 17:29, Rostislav Pehlivanov wrote:
>>>> +cglobal horizontal_compose_haar_10bit, 3, 6+ARCH_X86_64, 4, b, temp_, w,
>>>> x, b2
>>>> +
On 2018-07-26 17:29, Rostislav Pehlivanov wrote:
> On 26 July 2018 at 12:28, James Darnley wrote:
> +cglobal vertical_compose_haar_10bit, 3, 6, 4, b0, b1, w
>> +DECLARE_REG_TMP 4,5
>> +
>> +mova m2, [pd_1]
>> +mov r3d, wd
>> +and wd,
wavelet trasnform
+;* Copyright (c) 2018 James Darnley
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify it under the terms of the GNU Lesser General Public
+;* License as published by the Free Software Foundation; either
+;* version 2.
I will ask the same question as last time. Is the AVX worth it in Haar? Also I
am surprised that the AVX2 doesn't have a bigger difference on some of the
vertical transforms.
James Darnley (3):
diracdec: add 10-bit Haar SIMD functions
diracdec: add 10-bit Legall 5,3 (5_3) SIMD func
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the
relevant transform.
C: 84fps
SSE2: 111fps
AVX2: 115fps
dd97 vertical hi
sse2: 2.77x faster (31773 vs. 11457 decicycles) compared with C
avx2: 3.83x faster (31773 vs. 8297 decicycles) compared with C
---
libavcodec/x
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the
relevant transform.
C: 94fps
SSE2: 118fps
AVX2: 121fps
legall vertical hi
sse2: 3.86x faster (20201 vs. 5231 decicycles) compared with C
avx2: 6.70x faster (20201 vs. 3014 decicycles) compared with C
legall vertical l
On 2018-07-19 17:23, Rostislav Pehlivanov wrote:
> Could you provide standard overall transform results using START/STOP_TIMER
> rather than overall decoding speed?
Ask and ye shall receive.
> haar horizontal compose
> sse2: 3.67x faster (45248±108.1 vs. 12328±21.1 decicycles) compared with
On 2018-07-19 17:26, Rostislav Pehlivanov wrote:
> On 19 July 2018 at 15:52, James Darnley wrote:
>
>> int32_t *b1, int32_t *b2, int
>> b1[i] = COMPOSE_DIRAC53iH0(b0[i], b1[i], b2[i]);
>> }
>>
>> +static void dd97_vertical_hi_sse2(i
On 2018-07-19 17:23, Rostislav Pehlivanov wrote:
>
> Could you provide standard overall transform results using START/STOP_TIMER
> rather than overall decoding speed?
> Coefficients sizes and therefore golomb unpacking speed changes with
> respect to the transform so potentially there could be som
---
libavcodec/x86/dirac_dwt_10bit.asm| 3 ++-
libavcodec/x86/dirac_dwt_init_10bit.c | 13 +
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/dirac_dwt_10bit.asm
b/libavcodec/x86/dirac_dwt_10bit.asm
index ae110d2945..2e039e11ea 100644
--- a/libavcodec
---
libavcodec/x86/dirac_dwt_10bit.asm| 4 +++-
libavcodec/x86/dirac_dwt_init_10bit.c | 22 ++
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/libavcodec/x86/dirac_dwt_10bit.asm
b/libavcodec/x86/dirac_dwt_10bit.asm
index 681de5e1df..ae110d2945 100644
--- a/
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the
relevant transform.
C: 84fps
SSE2: 111fps
AVX2: 115fps
---
libavcodec/x86/dirac_dwt_10bit.asm| 38 +++
libavcodec/x86/dirac_dwt_init_10bit.c | 16 +++
2 files changed, 54 insertions(+)
dif
@@ -0,0 +1,113 @@
+;**
+;* x86 optimized discrete 10-bit wavelet trasnform
+;* Copyright (c) 2018 James Darnley
+;*
+;* This file is part of FFmpeg.
+;*
+;* FFmpeg is free software; you can redistribute it and/or
+;* modify
Speed of ffmpeg when decoding a 720p yuv422p10 file encoded with the
relevant transform.
C: 94fps
SSE2: 118fps
AVX2: 121fps
---
libavcodec/x86/dirac_dwt_10bit.asm| 55 +++
libavcodec/x86/dirac_dwt_init_10bit.c | 23 +++
2 files changed, 78 insertions(+)
dif
1 - 100 of 517 matches
Mail list logo