Re: [FFmpeg-devel] [PATCH] libavcodec Adding ff_v210_planar_unpack AVX2

2019-03-27 Thread James Darnley
On 2019-03-26 21:22, Mike Stoner via ffmpeg-devel wrote: > Hello, > I’ve accounted for all feedback on this so far, I’m wondering if it is ready > to be pushed upstream? > > Here are my results from ‘checkasm’ (lower is better): > > v210_unpack_c: 1636 > v210_unpack_ssse3: 611 > v210_unpack_avx:

[FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-10 Thread James Darnley
I am resending this my patches because I am not sure if I sent this version in the past. I split my changes into two patches because they do separate things. I also changed some tabs to spaces in Mike's AVX2 patch. James Darnley (2): avcodec/v210dec: move DSP function setting into dedi

[FFmpeg-devel] [PATCH 2/3] checkasm: add test for v210dec

2019-04-10 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

[FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
From: Michael Stoner Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX --- Mike, is this still the patch you want applied. I had to make a small amendment to it because you had some tabs as indentation. libavcodec/v210dec.c | 10 +- libavcodec/

[FFmpeg-devel] [PATCH 1/3] avcodec/v210dec: move DSP function setting into dedicated function

2019-04-10 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

Re: [FFmpeg-devel] [PATCH 3/3] libavcodec Adding ff_v210_planar_unpack AVX2

2019-04-10 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > From: Michael Stoner Screw you mailing list or git, which ever one of you managed to screw up the author's address. I will correct that, if I can. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.o

Re: [FFmpeg-devel] [PATCH 0/3] v210dec checkasm test and avx2 function

2019-04-18 Thread James Darnley
On 2019-04-10 14:47, James Darnley wrote: > I am resending this my patches because I am not sure if I sent this version in > the past. I split my changes into two patches because they do separate > things. > > I also changed some tabs to spaces in Mike's AVX2 patch. &

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 09:39, Michael Niedermayer wrote: > Fixes: "null pointer dereference" > Fixes: > 14551/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_V210_fuzzer-5088609952071680 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-o

Re: [FFmpeg-devel] [PATCH] avcodec/v210dec: Fix alignment check for AVX2

2019-05-18 Thread James Darnley
On 2019-05-18 12:15, Michael Niedermayer wrote: > On Sat, May 18, 2019 at 12:02:55PM +0200, James Darnley wrote: >> I object to the commit message though because it isn't a "null pointer >> dereference" but if that is the error as reported by the tool then keep >

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: > From: Limin Wang > > ... Why? And these are "comments" not "commands". signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.o

Re: [FFmpeg-devel] [PATCH 1/7] libavfilter/vf_overlay.c: change the commands style for the macro defined function

2019-05-24 Thread James Darnley
On 2019-05-24 12:06, James Darnley wrote: > On 2019-05-24 11:36, lance.lmw...@gmail.com wrote: >> From: Limin Wang >> >> ... > > Why? I see why: so you don't screw-up the macros you create later. signature.asc Descri

Re: [FFmpeg-devel] [PATCH] avcodec: Add librav1e encoder

2019-05-28 Thread James Darnley
On 2019-05-28 22:00, Derek Buitenhuis wrote: > On 28/05/2019 20:58, James Almer wrote: >> I think x26* and vpx/aom call it crf? It's not in option_tables.h in any >> case. > > They do not. This is a constant quantizer mode, not constant rate factor. IIRC either qp or cqp signature.asc Descrip

Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-27 Thread James Darnley
On 2019-06-28 04:26, Linjie Fu wrote: > Previously, media driver provided planar format(like 420 8 bit), but > for HEVC Range Extension (422/444 8/10 bit), the decoded image is > produced in packed format. > > Y210/AYUV/Y410 are packed formats which are needed in HEVC Rext decoding > for both VAAP

Re: [FFmpeg-devel] [PATCH 1/5] lavu/pixfmt: add Y210/AYUV/Y410 pixel formats

2019-06-28 Thread James Darnley
On 2019-06-28 03:03, Hendrik Leppkes wrote: > On Fri, Jun 28, 2019 at 1:26 AM James Darnley wrote: >> >> On 2019-06-28 04:26, Linjie Fu wrote: >>> Previously, media driver provided planar format(like 420 8 bit), but >>> for HEVC Range Extension (422/44

Re: [FFmpeg-devel] Issues while encoding a ts file to m3u8

2019-08-02 Thread James Darnley
On 2019-08-02 15:55, Ramana Jajula wrote: > Hi, > > I am trying to encode my ts file m3u8 using my customised ffmpeg of version > 4.1. I used below command to do encoding. > > ffmpeg -re -threads 8 -i /videos/input.ts -vcodec libx264 -s 320x240 -b:v > 512000 -maxrate 512000 -acodec libfdk_aac -b:

[FFmpeg-devel] [PATCH 1/7] x86inc: Fix VEX -> EVEX instruction conversion

2019-08-05 Thread James Darnley
From: Henrik Gramner There's an edge case that wasn't properly handled. --- libavutil/x86/x86inc.asm | 5 + 1 file changed, 5 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 5044ee86f0..bc370a6186 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86

[FFmpeg-devel] [PATCH 4/7] x86inc: Turn 'movsxd' into 'movifnidn' on x86-32

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 4 1 file changed, 4 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 10b7711637..04dbb6b785 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -293,6 +293,10 @@ DECLARE_REG_TMP_SIZ

[FFmpeg-devel] [PATCH 5/7] x86inc: Make 'non-adjacent' default in the TAIL_CALL macro

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 04dbb6b785..af35fe1e4d 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -685,7 +685,7 @@ DECLARE_

[FFmpeg-devel] [PATCH 6/7] x86inc: Improve warnings for use of unsupported instructions

2019-08-05 Thread James Darnley
From: Henrik Gramner Warn when the following are used without the appropriate cpuflag: * YMM and ZMM registers * 'pextrw' with a memory operand * GPR instruction set extensions --- libavutil/x86/x86inc.asm | 120 +++ 1 file changed, 83 insertions(+), 37 del

[FFmpeg-devel] [PATCH 0/7] Import some x264asm patches from x264

2019-08-05 Thread James Darnley
Here are a few easy-to-import patches from x264. These are all after x264 commit 4a158b00 "x86inc: Correctly set mmreg variables" which FFmpeg already has (commit eb5f063e7c). It does not include the following commits: * 82721eae "x86inc: Add x86-32 PIC support macros" * 101bd27d "x86inc: Support

[FFmpeg-devel] [PATCH 2/7] x86inc: Optimize VEX instruction encoding

2019-08-05 Thread James Darnley
From: Henrik Gramner Most VEX-encoded instructions require an additional byte to encode when src2 is a high register (e.g. x|ymm8..15). If the instruction is commutative we can swap src1 and src2 when doing so reduces the instruction length, e.g. vpaddw xmm0, xmm0, xmm8 -> vpaddw xmm0, xmm8,

[FFmpeg-devel] [PATCH 7/7] x86inc: Add support for GFNI instructions

2019-08-05 Thread James Darnley
From: Henrik Gramner --- libavutil/x86/x86inc.asm | 30 +- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index d1b4c982fc..8c8cc97e0c 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc

[FFmpeg-devel] [PATCH 3/7] x86inc: Improve SAVE/LOAD_MM_PERMUTATION macros

2019-08-05 Thread James Darnley
From: Henrik Gramner Use register numbers instead of copying the full register names. This makes it possible to change register widths in the middle of a function and keep the mmreg permutations intact which can be useful for code that only needs larger vectors for parts of the function in combin

Re: [FFmpeg-devel] [PATCH] frame: Simplify the video allocation

2018-09-03 Thread James Darnley
On 2018-09-03 15:29, James Almer wrote: > pass 32 - 1 to both av_image_fill_pointers() calls directly? Please do not add a magic number where nobody will find it. Use one of the 3 already existing methods for knowing the alignment necessary for assembly. If this is unrelated, my apologies.

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-05 Thread James Darnley
On 2018-09-05 22:52, Sigríður Regína Sigurþórsdóttir wrote: > +{"reserve_free_space", "Reserve a given amount of space at the > beginning og the file for unspecified purpose." I added the "metadata_header_padding" global option many years ago. Can you not reuse it for this purpose? Is it not

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: add reserve free space option

2018-09-06 Thread James Darnley
On 2018-09-06 19:39, Sigríður Regína Sigurþórsdóttir wrote: > +if (s->metadata_header_padding) { > +if (s->metadata_header_padding == 1) > +s->metadata_header_padding++; > +put_ebml_void(pb, s->metadata_header_padding); > +} Unfortunately I was forced to make th

Re: [FFmpeg-devel] [FFmpeg-cvslog] pthread_frame: merge the functionality for normal decoder init and init_thread_copy

2020-06-03 Thread James Darnley
On 2020-04-10 16:53, Anton Khirnov wrote: > ffmpeg | branch: master | Anton Khirnov | Mon Jan 9 > 18:04:42 2017 +0100| [1f4cf92cfbd3accbae582ac63126ed5570ddfd37] | committer: > Anton Khirnov > > pthread_frame: merge the functionality for normal decoder init and > init_thread_copy > > The cur

Re: [FFmpeg-devel] [PATCH 1/3] avcodec/bitpacked: ,

2020-06-03 Thread James Darnley
On 2020-06-04 01:19, Michael Niedermayer wrote: > Fixes: array end overread > Fixes: > 22395/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_BITPACKED_fuzzer-5760940300828672 > > Found-by: continuous fuzzing process > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg > Signed-off-

[FFmpeg-devel] [PATCH 0/2] WIP: h264, slice threads, draw_horiz_band

2019-09-02 Thread James Darnley
-frames in chunked mode. Needs more work. James Darnley (1): avcodec/h264: enable draw_horiz_band Kieran Kunhya (1): avcodec/h264: fix draw_horiz_band with slice threads libavcodec/h264_slice.c | 29 +++-- libavcodec/h264dec.c| 2 +- 2 files changed, 24 insert

[FFmpeg-devel] [PATCH 1/2] avcodec/h264: enable draw_horiz_band

2019-09-02 Thread James Darnley
--- libavcodec/h264dec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/h264dec.c b/libavcodec/h264dec.c index 8d1bd16a8e..b9f304936c 100644 --- a/libavcodec/h264dec.c +++ b/libavcodec/h264dec.c @@ -1056,7 +1056,7 @@ AVCodec ff_h264_decoder = { .init

[FFmpeg-devel] [PATCH 2/2] avcodec/h264: fix draw_horiz_band with slice threads

2019-09-02 Thread James Darnley
From: Kieran Kunhya --- libavcodec/h264_slice.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c index 5ceee107a0..fe2aa01ceb 100644 --- a/libavcodec/h264_slice.c +++ b/libavcodec/h264_slice.c @@

Re: [FFmpeg-devel] [PATCH] avutil/eval: add sgn()

2019-10-12 Thread James Darnley
On 2019-10-11 21:45, Paul B Mahol wrote: > diff --git a/doc/utils.texi b/doc/utils.texi > index d55dd315c3..4e2e713505 100644 > --- a/doc/utils.texi > +++ b/doc/utils.texi > @@ -920,6 +920,9 @@ corresponding input value will be returned. > @item round(expr) > Round the value of expression @var{e

Re: [FFmpeg-devel] [Contract Request] for FFmpeg libmp3lame multi-threaded feature implementation

2019-11-25 Thread James Darnley
On 2019-11-25 13:52, Chandra Nakka wrote: > Dear FFmpeg developers, > > I'm very happy to have found your details on FFmpeg website for requesting > FFmpeg feature implementation. > > Currently I'm using FFmpeg command line tool on my linux servers to process > media files into instant mp3 audio

Re: [FFmpeg-devel] [PATCH, v3, 1/7] lavu/pixfmt: add new pixel format 0yuv/y210/y410

2019-12-05 Thread James Darnley
On 2019-12-04 15:43, Linjie Fu wrote: > Previously, media driver provided planar format(like 420 8 bit), > but for HEVC Range Extension (422/444 8/10 bit), the decoded image > is produced in packed format because Windows expects it. > > Add some packed pixel formats for hardware decode support in

Re: [FFmpeg-devel] [IMPORTANT] FOSDEM meeting

2020-02-01 Thread James Darnley
On 28/01/2020, Liu Steven wrote: > > >> 在 2020年1月27日,下午3:29,Jean-Baptiste Kempf 写道: >> It will be joinable through some VideoConf tool. > Can we join by IRC or other things on internet? > Because these days are Spring Festival (Chinese New Year, Important > festivals that have lasted for thousand

Re: [FFmpeg-devel] What new instructions would you like?

2020-02-01 Thread James Darnley
On 30/12/2019, Lauri Kasanen wrote: > Hi, > > For the Libre RISC-V project, I'm going to research the popular codecs > and design new instructions to help speed them up. With ffmpeg being > home to lots of asm folks for many platforms, I also want to ask your > opinion. > > What new instructions w

Re: [FFmpeg-devel] Followup: FOSDEM meeting

2020-02-22 Thread James Darnley
On 2020-02-22 11:11, Thilo Borgmann wrote: > Please someone put an IRC log from the meeting there, too. James Darnley? > Also the audio was streamed, somebody might remember where too exactly. > Michael? I can post my log from the day, probably email attachment. Should I remove any of

Re: [FFmpeg-devel] Followup: FOSDEM meeting

2020-02-22 Thread James Darnley
On 2020-02-22 13:25, Paul B Mahol wrote: > On 2/22/20, James Darnley wrote: >> On 2020-02-22 11:11, Thilo Borgmann wrote: >>> Please someone put an IRC log from the meeting there, too. James Darnley? >>> Also the audio was streamed, somebody might remember where too ex

Re: [FFmpeg-devel] [PATCH] swscale/x86/yuv2rgb: Fix build without SSSE3

2020-02-23 Thread James Darnley
On 2020-02-23 13:22, Michael Niedermayer wrote: > From: Parker Ernest <@> > > commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on > x86_64 CPUs which do not have SSSE3, e.g. AMD Phenom-II > > Signed-off-by: Michael Niedermayer > --- > libswscale/x86/yuv2rgb.c | 2 ++ > 1 file change

Re: [FFmpeg-devel] [PATCH] Add .mailmap

2020-02-23 Thread James Darnley
On 2020-02-23 15:12, Jean-Baptiste Kempf wrote: > Yo, > > On Sat, Feb 22, 2020, at 22:18, Josh de Kock wrote: >> This allows for easy shortlog/log parsing, useful in determining >> eligible members of the general assembly for the new FFmpeg voting >> system. > > I think this is a good idea. > But

Re: [FFmpeg-devel] [PATCH] swscale/x86/yuv2rgb: Fix build without SSSE3

2020-02-23 Thread James Darnley
On 2020-02-23 18:58, Michael Niedermayer wrote: > On Sun, Feb 23, 2020 at 05:03:36PM +0100, Carl Eugen Hoyos wrote: >> Am So., 23. Feb. 2020 um 13:30 Uhr schrieb Michael Niedermayer >> : >>> >>> From: Parker Ernest <@> >>> >>> commit fc6a5883d6af8cae0e96af84dda0ad74b360a084 breaks build on >>> x86_

Re: [FFmpeg-devel] Lossy GIF encoding

2019-02-15 Thread James Darnley
On 2019-02-15 10:01, Kornel wrote: > libavcodec/gif.c in ff_gif_encoder.pix_fmts seems to passively declare types > of pixel formats it accepts. If you want to experiment you can change that so it accepts rgb (also or only). Then you can implement and test what you want, then you can ask about s

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-03 15:44, Martin Vignali wrote: > Hello, > > ... > > Not directly related to this patch, but it can be interesting for testing > purpose to write a checkasm test for the v210 func decoding. > So it's more easy to check the perf for "each" cpu flags, and be sure, the > various width cas

Re: [FFmpeg-devel] [PATCH] Added ff_v210_planar_unpack_aligned_avx2

2019-03-04 Thread James Darnley
On 2019-03-01 18:41, Michael Stoner wrote: > The AVX2 code leverages VPERMD to process 12 pixels/iteration. This is my > first patch submission so any comments are greatly appreciated. > > -Mike > > Tested on Skylake (Win32 & Win64) > 1920x1080 input frame > = > C code - 440

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-04 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 13 + libavcodec/v210dec.h | 1 + 2 files changed, 10 insertions(+), 4 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..28cf00d320 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210dec.c

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-04 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
On 2019-03-06 10:11, Paul B Mahol wrote: > On 3/6/19, Carl Eugen Hoyos wrote: >> 2019-03-04 23:58 GMT+01:00, James Darnley : >>> Prepare for checkasm test. >>> --- >>> libavcodec/v210dec.c | 13 + >>> libavcodec/v210dec.h | 1 + >&g

[FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7320ed5e37 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

Re: [FFmpeg-devel] [PATCH] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
On 2019-03-06 20:31, James Darnley wrote: > ... Wrong patch and wrong reference. Please ignore this. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..6db662538e 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

[FFmpeg-devel] [PATCH 2/2] checkasm: add test for v210dec

2019-03-06 Thread James Darnley
sm_check_vf_hflip(void); void checkasm_check_vf_threshold(void); diff --git a/tests/checkasm/v210dec.c b/tests/checkasm/v210dec.c new file mode 100644 index 00..7dd50a8271 --- /dev/null +++ b/tests/checkasm/v210dec.c @@ -0,0 +1,77 @@ +/* + * Copyright (c) 2019 James Darnley + * + * This file is par

[FFmpeg-devel] [PATCH 1/2] avcodec/v210dec: move DSP function setting into dedicated function

2019-03-06 Thread James Darnley
Prepare for checkasm test. --- libavcodec/v210dec.c | 16 ++-- libavcodec/v210dec.h | 1 + 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/libavcodec/v210dec.c b/libavcodec/v210dec.c index ddc5dbe8be..fd8a6b0d78 100644 --- a/libavcodec/v210dec.c +++ b/libavcodec/v210de

[FFmpeg-devel] [PATCH 0/5 v2] AVX functions for 8-bit H.264 IDCT

2017-04-04 Thread James Darnley
After better testing I have decided to only submit these two functions. The others did not provide a speedup better than the deviation in testing. Those patches remain in the list archive should someone wish to try them. James Darnley (5): avcodec/h264: change RETs into REP_RETs where

[FFmpeg-devel] [PATCH 5/5] avcodec/h264: add avx 8-bit h264_idct_dc_add

2017-04-04 Thread James Darnley
Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 20 libavcodec/x86/h264dsp_init.c | 2 ++ 2 files changed, 22 insertions(+) di

[FFmpeg-devel] [PATCH 1/5] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-04 Thread James Darnley
--- libavcodec/x86/h264_idct.asm | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index c36fea5..878ff02 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -695,7 +695,7 @@ cglo

[FFmpeg-devel] [PATCH 3/5] avcodec/h264: use some 3 operand forms

2017-04-04 Thread James Darnley
--- libavcodec/x86/h264_idct.asm | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index dde40e9..bc4dce4 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -87,10 +87,

[FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-04 Thread James Darnley
Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 33 - libavcodec/x86/h264dsp_init.c | 3 +++ 2 files changed, 35 ins

[FFmpeg-devel] [PATCH 2/5] avcodec/h264: change some labels to be macro-local

2017-04-04 Thread James Darnley
The labels get stripped leading to (slightly) nicer disassembly from objdump. --- libavcodec/x86/h264_idct.asm | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index 878ff02..dde40e9 100644 --

Re: [FFmpeg-devel] [PATCH 1/5] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-05 Thread James Darnley
On 2017-04-05 05:33, James Almer wrote: > On 4/4/2017 10:53 PM, James Darnley wrote: >> --- >> libavcodec/x86/h264_idct.asm | 12 ++-- >> 1 file changed, 6 insertions(+), 6 deletions(-) >> >> diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x

Re: [FFmpeg-devel] [PATCH 2/5] avcodec/h264: change some labels to be macro-local

2017-04-06 Thread James Darnley
On 2017-04-05 13:41, Ronald S. Bultje wrote: > Hi, > > On Tue, Apr 4, 2017 at 9:53 PM, James Darnley wrote: > >> The labels get stripped leading to (slightly) nicer disassembly from >> objdump. >> > [..] > >> -jz .cycle%1end >> +jz %%

Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-06 Thread James Darnley
On 2017-04-05 05:44, James Almer wrote: > On 4/4/2017 10:53 PM, James Darnley wrote: >> Haswell: >> - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext >> >> Skylake-U: >> - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext &

Re: [FFmpeg-devel] [PATCH 5/5] avcodec/h264: add avx 8-bit h264_idct_dc_add

2017-04-06 Thread James Darnley
On 2017-04-05 06:05, James Almer wrote: > On 4/4/2017 10:53 PM, James Darnley wrote: >> Haswell: >> - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext >> >> Skylake-U: >> - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with

Re: [FFmpeg-devel] [PATCH 1/5] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-14 Thread James Darnley
On 2017-04-05 22:26, Henrik Gramner wrote: > On Wed, Apr 5, 2017 at 3:53 AM, James Darnley wrote: >> call h264_idct_add8_mmx_plane >> -RET >> +RET ; TODO: check rep ret after a function call > > call followed by RET should be replaced by the TAIL

Re: [FFmpeg-devel] [PATCH 4/5] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-14 Thread James Darnley
On 2017-04-06 18:06, James Almer wrote: > Your numbers are really confusing. Could you post the actual numbers for > each function instead of doing comparisons? These figures are the actual numbers! Using the figures from Haswell above: > ff_h264_idct_add_8_mmx = 52 cycles > ff_h264_idct_add_8_s

[FFmpeg-devel] [PATCH 0/6 v3] AVX functions for 8-bit H.264 IDCT

2017-04-14 Thread James Darnley
Changes: - Added sse2 functions - Fixed an incorrect xmm register count I did not make the change suggested by Gramner about TAIL_CALL and I did leave the TODOs there. If there are no further objections I will push by Monday at the latest. I want to get this out the door. James Darnley (6

[FFmpeg-devel] [PATCH 1/6] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-14 Thread James Darnley
--- libavcodec/x86/h264_idct.asm | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index c36fea5..878ff02 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -695,7 +695,7 @@ cglo

[FFmpeg-devel] [PATCH 3/6] avcodec/h264: use some 3 operand forms

2017-04-14 Thread James Darnley
--- libavcodec/x86/h264_idct.asm | 21 + 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index dde40e9..bc4dce4 100644 --- a/libavcodec/x86/h264_idct.asm +++ b/libavcodec/x86/h264_idct.asm @@ -87,10 +87,

[FFmpeg-devel] [PATCH 4/6] avcodec/h264: add avx 8-bit h264_idct_add

2017-04-14 Thread James Darnley
Haswell: - 1.11x faster (522±0.4 vs. 469±1.8 decicycles) compared with mmxext Skylake-U: - 1.21x faster (671±5.5 vs. 555±1.4 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 33 - libavcodec/x86/h264dsp_init.c | 3 +++ 2 files changed, 35 ins

[FFmpeg-devel] [PATCH 6/6] avcodec/h264: add sse2 versions of previous idct functions

2017-04-14 Thread James Darnley
Kaby Lake Pentium: - ff_h264_idct_add_8_sse2:~1.18x faster than mmxext - ff_h264_idct_dc_add_8_sse2: ~1.07x faster than mmxext --- libavcodec/x86/h264_idct.asm | 11 +-- libavcodec/x86/h264dsp_init.c | 5 + 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/libavco

[FFmpeg-devel] [PATCH 2/6] avcodec/h264: change some labels to be macro-local

2017-04-14 Thread James Darnley
The labels get stripped leading to (slightly) nicer disassembly from objdump. --- libavcodec/x86/h264_idct.asm | 24 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/libavcodec/x86/h264_idct.asm b/libavcodec/x86/h264_idct.asm index 878ff02..dde40e9 100644 --

[FFmpeg-devel] [PATCH 5/6] avcodec/h264: add avx 8-bit h264_idct_dc_add

2017-04-14 Thread James Darnley
Haswell: - 1.02x faster (405±0.7 vs. 397±0.8 decicycles) compared with mmxext Skylake-U: - 1.06x faster (498±1.8 vs. 470±1.3 decicycles) compared with mmxext --- libavcodec/x86/h264_idct.asm | 20 libavcodec/x86/h264dsp_init.c | 2 ++ 2 files changed, 22 insertions(+) di

[FFmpeg-devel] [PATCH] add Falcom Xanadu demuxer

2017-04-15 Thread James Darnley
100644 index 00..4c5f32a1b6 --- /dev/null +++ b/libavformat/falcom_xa.c @@ -0,0 +1,98 @@ +/* + * Falcom Xanadu demuxer + * Copyright (c) 2016 James Darnley + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the

Re: [FFmpeg-devel] [PATCH 2/6] avcodec/h264: change some labels to be macro-local

2017-04-15 Thread James Darnley
On 2017-04-15 14:29, Ronald S. Bultje wrote: > Hi, > > On Fri, Apr 14, 2017 at 9:46 PM, James Darnley wrote: > >> The labels get stripped leading to (slightly) nicer disassembly from >> objdump. >> --- >> libavcodec/x86/h264_idct.asm | 24 +++

Re: [FFmpeg-devel] [PATCH] add Falcom Xanadu demuxer

2017-04-15 Thread James Darnley
On 2017-04-15 15:36, James Darnley wrote: > add Falcom Xanadu demuxer I mean Xanadu Next, not the original one. signature.asc Description: OpenPGP digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mail

Re: [FFmpeg-devel] [PATCH] add Falcom Xanadu demuxer

2017-04-15 Thread James Darnley
On 2017-04-15 17:56, James Almer wrote: > On 4/15/2017 10:36 AM, James Darnley wrote: >> --- >> libavformat/Makefile | 1 + >> libavformat/allformats.c | 1 + >> libavformat/falcom_xa.c | 98 >> &g

[FFmpeg-devel] Why do we use `strip -wN` instead of the more general `strip -x`?

2017-05-11 Thread James Darnley
I want to discuss why we use this and argue that we should be using `strip -x` all the time anyway. The man page for strip says that -x removes all non-global symbols. -wN is a combination of -w for wildcard matching and -N to remove a given symbol. -wN gets ..@* as an argument. Together they r

[FFmpeg-devel] [PATCH/WIP] avcodec/x86: move simple_idct to external assembly

2017-05-12 Thread James Darnley
--- For initial review and comments. I plan to drop the '2' from the filename before pushing. I haven't done it yet because I am still working on the file. I didn't make any changes with speedup in mind so I haven't done any benchmarking yet. libavcodec/x86/Makefile | 4 +- libavcod

Re: [FFmpeg-devel] [PATCH] configure: use -x instead of -wN ..@ to strip assembly files

2017-05-16 Thread James Darnley
On 2017-05-16 13:08, Rostislav Pehlivanov wrote: > Reduces the amount of debugging information of external asm from > uselessly verbose to informative enough. > --- > configure | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/configure b/configure > index e4862f6a35..df8

Re: [FFmpeg-devel] [PATCH/WIP] avcodec/x86: move simple_idct to external assembly

2017-05-23 Thread James Darnley
On 2017-05-18 19:13, Ronald S. Bultje wrote: > - do you think a checkasm test makes sense? That would also make > performance measuring easier. The (I)DCT code seems to have its own test program in the fate-idct8x8 test. That is built from libavcodec/tests/dct.c. It even includes its own benchma

[FFmpeg-devel] [PATCH 2/2] reindent

2017-05-29 Thread James Darnley
--- libavcodec/x86/idctdsp_init.c | 38 +++--- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/libavcodec/x86/idctdsp_init.c b/libavcodec/x86/idctdsp_init.c index 1f308cc079..f1c915aa00 100644 --- a/libavcodec/x86/idctdsp_init.c +++ b/libavcodec/x86/

[FFmpeg-devel] [PATCH 1/2] avcodec/x86: move simple_idct to external assembly

2017-05-29 Thread James Darnley
le IDCT MMX +; +; Copyright (c) 2001, 2002 Michael Niedermayer +; +; Conversion from gcc syntax to x264asm syntax with minimal modifications +; by James Darnley . +; +; This file is part of FFmpeg. +; +; FFmpeg is free software; you can redistribute it and/or +; modify it under the terms of the GNU

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/x86: move simple_idct to external assembly

2017-05-29 Thread James Darnley
On 2017-05-29 16:51, James Darnley wrote: > --- > Changes: > - Changed type of d4 constant to dwords because it gets used as dwords. > - Changed or removed HAVE_MMX_INLINE preprocessor guards. > - Added note about conversion from inline. > - New file no lon

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/x86: move simple_idct to external assembly

2017-05-30 Thread James Darnley
On 2017-05-29 23:26, Michael Niedermayer wrote: > On Mon, May 29, 2017 at 09:40:49PM +0200, James Darnley wrote: >> On 2017-05-29 16:51, James Darnley wrote: >>> --- >>> Changes: >>> - Changed type of d4 constant to dwords because it gets used

Re: [FFmpeg-devel] [PATCH 2/2] reindent

2017-05-30 Thread James Darnley
On 2017-05-29 16:51, James Darnley wrote: > Commit message: reindent Is this acceptable? Should I be more verbose? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-02 Thread James Darnley
her errors. James Darnley (6): initial alignment corrections for xmm registers change explicit mmx register use to x264asm style add and fix xmm version of simple_idct avcodec/x86: cleanup simple_idct10 add x86_64 8-bit simple_idct function change coeffs libavcodec/tests

[FFmpeg-devel] [PATCH 2/6] change explicit mmx register use to x264asm style

2017-06-02 Thread James Darnley
--- libavcodec/x86/simple_idct.asm | 1172 1 file changed, 586 insertions(+), 586 deletions(-) Picture s/mm([0-7])/m\1/g here for 1229 lines and 64695 bytes. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org

[FFmpeg-devel] [PATCH 1/6] initial alignment corrections for xmm registers

2017-06-02 Thread James Darnley
--- libavcodec/x86/simple_idct.asm | 47 ++ 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/libavcodec/x86/simple_idct.asm b/libavcodec/x86/simple_idct.asm index 6fedbb5784..b5d05ca653 100644 --- a/libavcodec/x86/simple_idct.asm +++ b/libavco

[FFmpeg-devel] [PATCH 3/6] add and fix xmm version of simple_idct

2017-06-02 Thread James Darnley
--- libavcodec/tests/x86/dct.c | 3 +++ libavcodec/x86/idctdsp_init.c | 1 + libavcodec/x86/simple_idct.asm | 45 ++ libavcodec/x86/simple_idct.h | 1 + 4 files changed, 50 insertions(+) diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x

[FFmpeg-devel] [PATCH 6/6] change coeffs

2017-06-02 Thread James Darnley
--- libavcodec/x86/simple_idct10.asm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavcodec/x86/simple_idct10.asm b/libavcodec/x86/simple_idct10.asm index b4b47afcee..ae848b7faf 100644 --- a/libavcodec/x86/simple_idct10.asm +++ b/libavcodec/x86/simple_idct10.asm @@ -46,

[FFmpeg-devel] [PATCH 4/6] avcodec/x86: cleanup simple_idct10

2017-06-02 Thread James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register. --- libavcodec/x86/proresdsp.asm | 2 +- libavcodec/x86/simple_idct10.asm | 8 ++-- libavcodec/x86/simple_i

[FFmpeg-devel] [PATCH 5/6] add x86_64 8-bit simple_idct function

2017-06-02 Thread James Darnley
--- libavcodec/tests/x86/dct.c | 2 ++ libavcodec/x86/idctdsp_init.c| 10 ++ libavcodec/x86/simple_idct.h | 3 +++ libavcodec/x86/simple_idct10.asm | 6 ++ 4 files changed, 21 insertions(+) diff --git a/libavcodec/tests/x86/dct.c b/libavcodec/tests/x86/dct.c index 971

Re: [FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-05 Thread James Darnley
To answer the couple of questions that were asked over the weekend. Rostislav, about the performance. I can see how to force a particular IDCT implementation for real world decoding (the -idct option) but the MPEG2 HD sample I've been working with mostly uses the "idct add" function which doesn't

[FFmpeg-devel] [PATCH 0/5] x264asm: take some patches from upstream

2017-06-08 Thread James Darnley
Incorporate some of the recent changes committed to x264. This is an initial set with no controversial changes: no nasm requirement, no avx512. I do want your comments on where I should put the aesni define in the last patch. I will make a note on that one too. I will attempt to upstream that d

[FFmpeg-devel] [PATCH 2/5] x86inc: Make REP_RET identical to RET in SSSE3+ functions

2017-06-08 Thread James Darnley
From: Henrik Gramner There's no point in emitting a rep prefix before ret on modern CPUs. --- libavutil/x86/x86inc.asm | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index f2a6a3f1db..44069741cc 100644 --- a/libavutil/x86/x

[FFmpeg-devel] [PATCH 5/5] x86: Add some additional cpuflag relations

2017-06-08 Thread James Darnley
From: Henrik Gramner Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2 --- This is the patch I was talking about. Where should I put the aesni define? x264 doesn't have it but I will try to get it upstreamed. l

[FFmpeg-devel] [PATCH 3/5] x86inc: Prefer r14/r15 over r12/r13 on x86-64

2017-06-08 Thread James Darnley
From: Henrik Gramner Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them. --- libavutil/x86/x86inc.asm | 16 1 file change

[FFmpeg-devel] [PATCH 1/5] x86inc: Fix call with memory operands

2017-06-08 Thread James Darnley
From: Henrik Gramner We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier. --- libavutil/x86/x86inc.asm | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-)

[FFmpeg-devel] [PATCH 4/5] x86inc: Remove argument from WIN64_RESTORE_XMM

2017-06-08 Thread James Darnley
From: Anton Mitrofanov The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0. --- libavutil/x86/x86inc.asm | 19 ++- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm

Re: [FFmpeg-devel] [PATCH 5/5] x86: Add some additional cpuflag relations

2017-06-09 Thread James Darnley
On 2017-06-09 10:08, Henrik Gramner wrote: > On Fri, Jun 9, 2017 at 1:05 AM, James Darnley wrote: >> Where should I put the aesni define? > > Between sse42 and avx. Thank you. I will change this and the first patch to bump the date. I'll give other people about an hour to

[FFmpeg-devel] [PATCH 1/1] configure: require NASM version 2.11 or newer for external x86 assembly

2017-06-09 Thread James Darnley
--- configure | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/configure b/configure index e3941f9dfd..69bbf25bf5 100755 --- a/configure +++ b/configure @@ -3258,7 +3258,7 @@ pkg_config_default=pkg-config ranlib_default="ranlib" strip_default="strip" versio

  1   2   3   4   5   6   >