from:"Ivan Kalvachev"

Re: [FFmpeg-devel] [PATCH 1/2] opus_pvq: add resynth support and band encoding cost function

2017-04-14 Thread Ivan Kalvachev

On 4/15/17, wm4  wrote:
> On Sat, 15 Apr 2017 00:28:19 +0200
> Carl Eugen Hoyos  wrote:
>
>> 2017-04-14 23:27 GMT+02:00 Clément Bœsch :
>> > On Fri, Apr 14, 2017 at 02:30:28PM +0200, Carl Eugen Hoyos wrote:
>> > [...]
>> >> We know that the avconv developers are violating the
>> >> copyright of many people, we fix that in the FFmpeg code as
>> >> soon as we are aware of it, there are many examples in gitlog.
>> >
>> > This is a serious accusation, what are you referring to?
>>
>> To the same (many) commits in the avconv repository that
>> violate the copyright of FFmpeg developers as many times
>> before - sorry, I don't understand: Are you implying you don't
>> remember such a commit?
>
> Of course you keep repeating your defamation without providing proof,
> ever.

http://git.videolan.org/?p=ffmpeg.git&a=search&h=HEAD&st=commit&s=attribution

I found the above link in the "Reintroducing FFmpeg to Debian" thread (2014).
It still works :D

I do remember that in 2014 Ronald was pressuring Libav (e.g. Vittorio)
about proper attribution to the changes done by multiple Libav developers when
merging ffvp9. (AKA who exactly changed what).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] New API usage example (encode_raw_audio_file_to_aac)

2017-05-06 Thread Ivan Kalvachev

On 3/31/17, Paolo Prete  wrote:
> ---
>  configure   |   2 +
>  doc/Makefile|  41 ++--
>  doc/examples/.gitignore |   1 +
>  doc/examples/Makefile   |   1 +
>  doc/examples/encode_raw_audio_file_to_aac.c | 338
> 
>  5 files changed, 363 insertions(+), 20 deletions(-)
>  create mode 100644 doc/examples/encode_raw_audio_file_to_aac.c

What is the status of this patch?
It's been quite a while and there have been no objections.

Maybe it should be committed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-08 Thread Ivan Kalvachev

1 reciprocal op that is a lot faster
than real division, but gives half precision.
"2" uses 1 multiplication and 1 reciprocal square root op, that is
literally 1 cycle, but again gives half precision.

PRESEARCH_ROUNDING
This control the rounding of the gain used for guess.
"0" means using truncf() that makes sure that the pulses would never
be more than K.
It gives results identical to the original celt_* functions
"1" means using lrintf(), this is basically the improvement of the
current C code over the celt_ one.


ALL_FLOAT_PRESEARCH
The presearch filling of outY[] could be done entirely with float ops
(using SSE4.2 'roundps' instead of two cvt*).  It is mostly useful if
you want to try YMM on AVX1 (AVX1 lacks 256 integer ops).
For some reason enabling this makes the whole function 4 times slower
on my CPU. ^_^

I've left some commented out code. I'll remove it for sure in the final version.

I just hope I haven't done some lame mistake in the last minute...
From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 1/3] SIMD opus pvq_search implementation.

---
 libavcodec/opus_pvq.c  |   9 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   1 +
 libavcodec/x86/opus_dsp_init.c |  47 +++
 libavcodec/x86/opus_pvq_search.asm | 597 +
 5 files changed, 657 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..172223e241 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -418,7 +418,13 @@ static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_
 int *y = pvq->qcoeff;
 
 celt_exp_rotation(X, N, blocks, K, spread, 1);
+
+{
+START_TIMER
 gain /= sqrtf(pvq->pvq_search(X, y, K, N));
+STOP_TIMER("pvq_search");
+}
+
 celt_encode_pulses(rc, y,  N, K);
 celt_normalize_residual(y, X, N, gain);
 celt_exp_rotation(X, N, blocks, K, spread, 0);
@@ -947,6 +953,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index 6691494838..9246337360 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -33,8 +33,8 @@
float *lowband_scratch, int fill)
 
 struct CeltPVQ {
-DECLARE_ALIGNED(32, int,   qcoeff  )[176];
-DECLARE_ALIGNED(32, float, hadamard_tmp)[176];
+DECLARE_ALIGNED(32, int,   qcoeff  )[256];
+DECLARE_ALIGNED(32, float, hadamard_tmp)[256];
 
 float (*pvq_search)(float *X, int *y, int K, int N);
 
@@ -45,6 +45,7 @@ struct CeltPVQ {
 };
 
 int  ff_celt_pvq_init  (struct CeltPVQ **pvq);
+void ff_opus_dsp_init_x86(struct CeltPVQ *s);
 void ff_celt_pvq_uninit(struct CeltPVQ **pvq);
 
 #endif /* AVCODEC_OPUS_PVQ_H */
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 710e48b15f..f50a48dc13 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -51,6 +51,7 @@ OBJS-$(CONFIG_APNG_DECODER)+= x86/pngdsp_init.o
 OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
+OBJS-$(CONFIG_OPUS_ENCODER)+= x86/opus_dsp_init.o x86/opus_pvq_search.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
 OBJS-$(CONFIG_MLP_DECODER) += x86/mlpdsp_init.o
diff --git a/libavcodec/x86/opus_dsp_init.c b/libavcodec/x86/opus_dsp_init.c
new file mode 100644
index 00..ddccf6fe9e
--- /dev/null
+++ b/libavcodec/x86/opus_dsp_init.c
@@ -0,0 +1,47 @@
+/*
+ * Opus encoder assembly optimizations
+ * Copyright (C) 2017 Ivan Kalvachev 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA

Re: [FFmpeg-devel] [PATCH] avcodec/fft_template: Fix multiple runtime error: signed integer overflow: -1943918714 - 1935113003 cannot be represented in type 'int'

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, Michael Niedermayer  wrote:
> On Thu, Jun 08, 2017 at 06:35:07PM -0400, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Thu, Jun 8, 2017 at 6:10 PM, Michael Niedermayer
>> 
>> wrote:
>>
>> > Signed value in
>> > Unsigned
>> > INTeger type
>>
>> [..]
>> > Both SUINT and unsigned should produce identical binaries
>>
>> This seems to go against the rule that code should be as simple as
>> possible.
>>
>> Unsigned is simpler than SUINT if the outcome is the same.
>
> You can simply add the part of my mail here as awnser that you snipped
> away:
>
> "But it makes the code hard to understand and maintain because these
>  values are not positive integers but signed integers. Which for
>  C standard compliance need to be stored in a unsigned type."
>
> A type that avoids the undefinedness of signed but is semantically
> signed is correct, unsigned is not.
>
> If understandable code and maintainable code has no value to you,
> you would favour using single letter variables exclusivly and would
> never use typedef.
> But you do not do that.
>
> I fail to understand why you insist on using unsigned in place of a
> more specific type, it is not the correct nor clean thing to do.

If I understand correctly, the root of the problem is the undefined behavior
of the compiler and the C standard.
Is there any chance that FFmpeg project could initiate a change in the
next C standard, so this thing would be defined?
(and I guess, also define a few other similar things, like signed
right shift, etc...)

It is a matter of fact, that a lot of compiler-defined-things have
been implemented in
certain ways on most compilers and that there are plenty of programs
depend on these
specific implementations and break when a new compiler does things differently
(clang had nice article about it).

About the typedef, is SUINT a standard defined type for workarounding
this specific problem?
I do suspect that some of our fellow developers simply find its name ugly.

Maybe they would be more welcoming if
"suint32_t" like typedef is used?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/vp9: ipred_dr_16x16_16 avx2 implementation

2017-06-09 Thread Ivan Kalvachev

On 6/8/17, Ilia Valiakhmetov  wrote:
> vp9_diag_downright_16x16_12bpp_c: 149.0
> vp9_diag_downright_16x16_12bpp_sse2: 67.8
> vp9_diag_downright_16x16_12bpp_ssse3: 45.6
> vp9_diag_downright_16x16_12bpp_avx: 36.6
> vp9_diag_downright_16x16_12bpp_avx2: 25.5
>
> ~30% faster than avx
>
> Signed-off-by: Ilia Valiakhmetov 
> ---
>  libavcodec/x86/vp9dsp_init_16bpp.c|  2 ++
>  libavcodec/x86/vp9intrapred_16bpp.asm | 56
> +++
>  2 files changed, 58 insertions(+)
>
> diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c
> b/libavcodec/x86/vp9dsp_init_16bpp.c
> index d1b8fcd..8d1aa13 100644
> --- a/libavcodec/x86/vp9dsp_init_16bpp.c
> +++ b/libavcodec/x86/vp9dsp_init_16bpp.c
> @@ -52,6 +52,7 @@ decl_ipred_fns(dc,  16, mmxext, sse2);
>  decl_ipred_fns(dc_top,  16, mmxext, sse2);
>  decl_ipred_fns(dc_left, 16, mmxext, sse2);
>  decl_ipred_fn(dl,   16, 16, avx2);
> +decl_ipred_fn(dr,   16, 16, avx2);
>  decl_ipred_fn(dl,   32, 16, avx2);
>
>  #define decl_ipred_dir_funcs(type) \
> @@ -136,6 +137,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext
> *dsp)
>  init_fpel_func(1, 1,  64, avg, _16, avx2);
>  init_fpel_func(0, 1, 128, avg, _16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2);
> +init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2);
>  }
>
> diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm
> b/libavcodec/x86/vp9intrapred_16bpp.asm
> index 92333bc..67b98b1 100644
> --- a/libavcodec/x86/vp9intrapred_16bpp.asm
> +++ b/libavcodec/x86/vp9intrapred_16bpp.asm
> @@ -1170,6 +1170,62 @@ DR_FUNCS 2
>  INIT_XMM avx
>  DR_FUNCS 2
>
> +%if HAVE_AVX2_EXTERNAL
> +INIT_YMM avx2
> +cglobal vp9_ipred_dr_16x16_16, 4, 6, 7, dst, stride, l, a
> +movam0, [lq]   ; klmnopqrstuvwxyz
> +movum1, [aq-2] ; *abcdefghijklmno
> +movam2, [aq]   ; abcdefghijklmnop

I know unaligned loads are not as slow as they used to be,
but could m1 be produced by m2 and palignr?

From the comment I assume you don't use the extra two bytes
that you get from the load, as you mark them as "*"
generic undefined values

> +vperm2i128  m4, m2, m2, q2001  ; ijklmnop
> +vpalignrm5, m4, m2, 2  ; bcdefghijklmnop.
> +vperm2i128  m3, m0, m1, q0201  ; stuvwxyz*abcdefg
> +LOWPASS  1,  2,  5 ; ABCDEFGHIJKLMNO.
> +vpalignrm4, m3, m0, 2  ; lmnopqrstuvwxyz*
> +vpalignrm5, m3, m0, 4  ; mnopqrstuvwxyz*a
> +LOWPASS  0,  4,  5 ; LMNOPQRSTUVWXYZ#
> +vperm2i128  m5, m0, m1, q0201  ; TUVWXYZ#ABCDEFGH
> +DEFINE_ARGS dst, stride, stride3, stride5, dst3, cnt

"cnt" doesn't seem to be used.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, Michael Niedermayer  wrote:
> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote:
>>  opus_pvq.c  |9
>>  opus_pvq.h  |5
>>  x86/Makefile|1
>>  x86/opus_dsp_init.c |   47 +++
>>  x86/opus_pvq_search.asm |  597
>> 
>>  5 files changed, 657 insertions(+), 2 deletions(-)
>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2
>> 0001-SIMD-opus-pvq_search-implementation.patch
>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001
>> From: Ivan Kalvachev 
>> Date: Thu, 8 Jun 2017 22:24:33 +0300
>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation.
>
> seems this breaks build with mingw64, didnt investigate but it
> fails with these errors:
>
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
> collect2: error: ld returned 1 exit status
> collect2: error: ld returned 1 exit status
> make: *** [ffmpeg_g.exe] Error 1
> make: *** Waiting for unfinished jobs
> make: *** [ffprobe_g.exe] Error 1


const_*_edge is used on only one place is the code.
Would you check if this patch fixes the issue.

--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
 add Nq,   r4q   ; Nq = align(Nq, mmsize)
 sub rsp,  Nq; allocate tmpX[Nq]

-movups  m3,   [const_align_abs_edge-mmsize+r4q] ; this is
the bit mask for the padded read at the end of the input
+movups  m3,   [const_align_abs_mask+32-mmsize+r4q] ; this
is the bit mask for the padded read at the end of the input

 lea r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
up) to mmsize, so r4q can't become negative here, unless N=0.
 movups  m2,   [inXq + r4q]
===
I expected that the addresses would be pre-calculated
by n/yasm as one value and indexed
relative to the section start.
Instead it seems that each entry is represented with
its own address and offset from it.
Since the offset is negative it uses all 64 bits and
it makes difference if it is truncated to 32 bits.

Same issue could happen with clang tools.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, Ivan Kalvachev  wrote:
> On 6/9/17, Michael Niedermayer  wrote:
>> On Fri, Jun 09, 2017 at 01:36:07AM +0300, Ivan Kalvachev wrote:
>>>  opus_pvq.c  |9
>>>  opus_pvq.h  |5
>>>  x86/Makefile|1
>>>  x86/opus_dsp_init.c |   47 +++
>>>  x86/opus_pvq_search.asm |  597
>>> 
>>>  5 files changed, 657 insertions(+), 2 deletions(-)
>>> 3b9648bea3f01dad2cf159382f0ffc2d992c84b2
>>> 0001-SIMD-opus-pvq_search-implementation.patch
>>> From 06dc798c302e90aa5b45bec5d8fbcd64ba4af076 Mon Sep 17 00:00:00 2001
>>> From: Ivan Kalvachev 
>>> Date: Thu, 8 Jun 2017 22:24:33 +0300
>>> Subject: [PATCH 1/3] SIMD opus pvq_search implementation.
>>
>> seems this breaks build with mingw64, didnt investigate but it
>> fails with these errors:
>>
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>> collect2: error: ld returned 1 exit status
>> collect2: error: ld returned 1 exit status
>> make: *** [ffmpeg_g.exe] Error 1
>> make: *** Waiting for unfinished jobs
>> make: *** [ffprobe_g.exe] Error 1
>
>
> const_*_edge is used on only one place is the code.
> Would you check if this patch fixes the issue.
>
Sorry, the patch was not tested and the variable name was not correct.
This one should be fine... I hope

--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -419,7 +419,7 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
 add Nq,   r4q   ; Nq = align(Nq, mmsize)
 sub rsp,  Nq; allocate tmpX[Nq]

-movups  m3,   [const_align_abs_edge-mmsize+r4q] ; this is
the bit mask for the padded read at the end of the input
+movups  m3,   [const_float_abs_mask+32-mmsize+r4q] ; this
is the bit mask for the padded read at the end of the input

 lea r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
up) to mmsize, so r4q can't become negative here, unless N=0.
 movups  m2,   [inXq + r4q]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC] x86 external assembler requirements

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, Hendrik Leppkes  wrote:
> On Fri, Jun 9, 2017 at 2:51 PM, James Darnley  wrote:
>> I propose that we drop support for all assemblers older than NASM version
>> 2.11.
>> That was released on 2013-12-31 with several point releases over the
>> following
>> year with 2.11.08 being released on 2015-02-21.
>>
>> The following patch does just that.  Please do not be concerned about the
>> ZMM
>> register use.  This patch does not address the use of AVX-512
>> instructions.  In
>> a future patch I will add a different check for enabling/disabling
>> AVX-512.
>>
>> The NASM changelog can be found here:
>> http://www.nasm.us/doc/nasmdocc.html
>>
>> Releases and their dates can be seen here:
>> http://www.nasm.us/pub/nasm/releasebuilds/?C=M;O=D
>>
>> Other than the discussion about the oldest version to support we should
>> discuss
>> the command line options for configure and the internal variable names.
>> At
>> present they all use "yasm" somewhere.  If we are to drop support for that
>> then
>> we should change the option names to prevent confusion.
>>
>> We could just change them to "nasm" and be done.  We could provide
>> compatability
>> options.  We could adopt Libav's generic "x86asm".

It's better to use generic name.

I do not like "x86asm", as 'gnu as' also fits that description,
but it is better than changing the name every time some
compiler takes the lead. E.g. zasm might be the next
champion of AVX512 assembly. ;)



>> James Darnley (1):
>>   configure: require NASM version 2.11 or newer for external x86
>> assembly
>>
>>  configure | 17 -
>>  1 file changed, 4 insertions(+), 13 deletions(-)
>>
>> Brought to you by my inability to configure FFmpeg with NASM 2.13.
>>
>
> Maybe you should just pick the patches from Libav, that still
> implement an automatic fallback to YASM if NASM is not found or too
> old? We would eventually get to them in the merges anyway.
> Many people building FFmpeg for years with YASM may not have an
> appropriate NASM installed, and forcing a switch here may not be
> ideal.

+1
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/vp9: ipred_dr_16x16_16 avx2 implementation

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, Ilia Valiakhmetov  wrote:
> Signed-off-by: Ilia Valiakhmetov 
> ---
>  libavcodec/x86/vp9dsp_init_16bpp.c|  2 ++
>  libavcodec/x86/vp9intrapred_16bpp.asm | 56
> +++
>  2 files changed, 58 insertions(+)
>
> diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c
> b/libavcodec/x86/vp9dsp_init_16bpp.c
> index d1b8fcd..8d1aa13 100644
> --- a/libavcodec/x86/vp9dsp_init_16bpp.c
> +++ b/libavcodec/x86/vp9dsp_init_16bpp.c
> @@ -52,6 +52,7 @@ decl_ipred_fns(dc,  16, mmxext, sse2);
>  decl_ipred_fns(dc_top,  16, mmxext, sse2);
>  decl_ipred_fns(dc_left, 16, mmxext, sse2);
>  decl_ipred_fn(dl,   16, 16, avx2);
> +decl_ipred_fn(dr,   16, 16, avx2);
>  decl_ipred_fn(dl,   32, 16, avx2);
>
>  #define decl_ipred_dir_funcs(type) \
> @@ -136,6 +137,7 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext
> *dsp)
>  init_fpel_func(1, 1,  64, avg, _16, avx2);
>  init_fpel_func(0, 1, 128, avg, _16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2);
> +init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2);
>  init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2);
>  }
>
> diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm
> b/libavcodec/x86/vp9intrapred_16bpp.asm
> index 92333bc..7230de2 100644
> --- a/libavcodec/x86/vp9intrapred_16bpp.asm
> +++ b/libavcodec/x86/vp9intrapred_16bpp.asm
> @@ -1170,6 +1170,62 @@ DR_FUNCS 2
>  INIT_XMM avx
>  DR_FUNCS 2
>
> +%if HAVE_AVX2_EXTERNAL
> +INIT_YMM avx2
> +cglobal vp9_ipred_dr_16x16_16, 4, 4, 6, dst, stride, l, a
[...]
> +DEFINE_ARGS dst, stride, stride3, stride5, dst3

You removed one variable, so now the number of
re-define-args gprs should be 5.
However the cglobal above have 4 reserved registers.

It used to be  4, 6, 6
Now it is 4, 4, 6
I think it should be 4, 5, 6

Do I miss something?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/fft_template: Fix multiple runtime error: signed integer overflow: -1943918714 - 1935113003 cannot be represented in type 'int'

2017-06-09 Thread Ivan Kalvachev

On 6/9/17, wm4  wrote:
> On Fri, 9 Jun 2017 00:10:49 +0200
> Michael Niedermayer  wrote:
>
>> On Sat, May 27, 2017 at 12:24:16PM +0200, wm4 wrote:
>> > On Sat, 27 May 2017 03:56:42 +0200
>> > Michael Niedermayer  wrote:
>> >
>> > > On Fri, May 26, 2017 at 07:06:44PM -0400, Ronald S. Bultje wrote:
>> > > > Hi,
>> > > >
>> > > > On Fri, May 26, 2017 at 6:34 PM, Michael Niedermayer
>> > > > > > > > > wrote:
>> > > >
>> > > > > On Fri, May 26, 2017 at 06:07:35PM -0400, Ronald S. Bultje wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > On Fri, May 26, 2017 at 5:50 PM, Michael Niedermayer
>> > > > > > > > > > > > wrote:
>> > > > > >
>> > > > > > > On Fri, May 26, 2017 at 11:18:12PM +0200, Hendrik Leppkes
>> > > > > > > wrote:
>> > > > > > > > On Fri, May 26, 2017 at 11:11 PM, Michael Niedermayer
>> > > > > > > >  wrote:
>> > > > > > > > > On Fri, May 26, 2017 at 03:20:14PM +0100, Rostislav
>> > > > > > > > > Pehlivanov
>> > > > > wrote:
>> > > > > > > > >> On 26 May 2017 at 12:21, wm4 
>> > > > > > > > >> wrote:
>> > > > > > > > >>
>> > > > > > > > >> > On Thu, 25 May 2017 16:10:49 +0200
>> > > > > > > > >> > Michael Niedermayer  wrote:
>> > > > > > > > >> >
>> > > > > > > > >> > > Fixes:
>> > > > > > > > >> > > 1735/clusterfuzz-testcase-minimized-5350472347025408
>> > > > > > > > >> > >
>> > > > > > > > >> > > Found-by: continuous fuzzing process
>> > > > > > > https://github.com/google/oss-
>> > > > > > > > >> > fuzz/tree/master/projects/ffmpeg
>> > > > > > > > >> > > Signed-off-by: Michael Niedermayer
>> > > > > > > > >> > > 
>> > > > > > > > >> > > ---
>> > > > > > > > >> > >  libavcodec/fft_template.c | 50
>> > > > > > > > >> > > +++---
>> > > > > > > > >> > -
>> > > > > > > > >> > >  1 file changed, 25 insertions(+), 25 deletions(-)
>> > > > > > > > >> > >
>> > > > > > > > >> > > diff --git a/libavcodec/fft_template.c
>> > > > > b/libavcodec/fft_template.c
>> > > > > > > > >> > > index 480557f49f..e3a37e5d69 100644
>> > > > > > > > >> > > --- a/libavcodec/fft_template.c
>> > > > > > > > >> > > +++ b/libavcodec/fft_template.c
>> > > > > > > > >> > > @@ -249,7 +249,7 @@ static void fft_calc_c(FFTContext
>> > > > > > > > >> > > *s,
>> > > > > > > FFTComplex *z)
>> > > > > > > > >> > {
>> > > > > > > > >> > >
>> > > > > > > > >> > >  int nbits, i, n, num_transforms, offset, step;
>> > > > > > > > >> > >  int n4, n2, n34;
>> > > > > > > > >> > > -FFTSample tmp1, tmp2, tmp3, tmp4, tmp5, tmp6,
>> > > > > > > > >> > > tmp7, tmp8;
>> > > > > > > > >> > > +SUINT tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7,
>> > > > > > > > >> > > tmp8;
>> > > > > > > > >> >
>> > > > > > > > >> > I want this SUINT thing gone, not have more of it.
>> > > > > > > > >> > ___
>> > > > > > > > >> > ffmpeg-devel mailing list
>> > > > > > > > >> > ffmpeg-devel@ffmpeg.org
>> > > > > > > > >> > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>> > > > > > > > >> >
>> > > > > > > > >>
>> > > > > > > > >> I agree, especially here.
>> > > > > > > > >
>> > > > > > > > >> Overflows should be left to happen in transforms if the
>> > > > > > > > >> input is
>> > > > > > > corrupt.
>> > > > > > > > >
>> > > > > > > > > signed int overflow is not allowed in C, if that is what
>> > > > > > > > > you meant.
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Its "undefined", which means what the result will be is not
>> > > > > > > > defined -
>> > > > > > > > which in such a DSP function is irrelevant, if all it causes
>> > > > > > > > is
>> > > > > > > > corrupt output ... from corrupt input.
>> > > > > > >
>> > > > > > > no, this is not correct.
>> > > > > > > undefined behavior does not mean the effect will be limited to
>> > > > > > > the output.
>> > > > > > > It could cause the process to hard lockup, trigger an
>> > > > > > > exception or
>> > > > > > > set a flag causing errors in later unrelated code.
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > > We've discussed this before, if you believe this to be
>> > > > > > exploitable, why
>> > > > > > allow it in our repository at all? I know of no other project
>> > > > > > that even
>> > > > > > remotely allows such ridiculous things. Please limit exploitable
>> > > > > > code to
>> > > > > > your personal tree, ffmpeg is not a rootkit.
>> > > > >
>> > > > > please calm down, you make all kinds of statments and implications
>> > > > > in
>> > > > > the text above which are not true.
>> > > > >
>> > > > > This specific code in git triggers undefined behavior, the patch
>> > > > > fixes
>> > > > > this undefined behavior.
>> > > > > If that is exploitable (which i neither claim it is nor that it
>> > > > > isnt)
>> > > > > its a issue that exists before the patch but not afterwards.
>> > > >
>> > > >
>> > >
>> > > > *unsigned* removes the exploit. SUINT keeps it, and is therefore
>> > > > part of a
>> > > > rootkit.
>> > >
>> > > SUINT is defined to unsigned, if unsigned re

Re: [FFmpeg-devel] [PATCH] avcodec/fft_template: Fix multiple runtime error: signed integer overflow: -1943918714 - 1935113003 cannot be represented in type 'int'

2017-06-10 Thread Ivan Kalvachev

On 6/10/17, Ronald S. Bultje  wrote:
> Hi,
>
> On Thu, Jun 8, 2017 at 8:57 PM, Michael Niedermayer 
> wrote:
>
>> On Thu, Jun 08, 2017 at 06:35:07PM -0400, Ronald S. Bultje wrote:
>> > Hi,
>> >
>> > On Thu, Jun 8, 2017 at 6:10 PM, Michael Niedermayer
>> 
>> > wrote:
>> >
>> > > Signed value in
>> > > Unsigned
>> > > INTeger type
>> >
>> > [..]
>> > > Both SUINT and unsigned should produce identical binaries
>> >
>> > This seems to go against the rule that code should be as simple as
>> possible.
>> >
>> > Unsigned is simpler than SUINT if the outcome is the same.
>>
>> You can simply add the part of my mail here as awnser that you snipped
>> away:
>>
>> "But it makes the code hard to understand and maintain because these
>>  values are not positive integers but signed integers. Which for
>>  C standard compliance need to be stored in a unsigned type."
>>
>> A type that avoids the undefinedness of signed but is semantically
>> signed is correct, unsigned is not.
>>
>> If understandable code and maintainable code has no value to you,
>> you would favour using single letter variables exclusivly and would
>> never use typedef.
>> But you do not do that.
>>
>> I fail to understand why you insist on using unsigned in place of a
>> more specific type, it is not the correct nor clean thing to do.
>
>
> It's not just me, it appears to be most of us. Can't you just step back at
> some point and be like "ok, I'll let the majority have their way"?

There was actually a technical discussion undergoing,
until your regular "majority" group came with strong words,
opinions, and comments that clearly indicate that you don't
understand the issue at hand.

I'll try to explain the issue one more time.

---

You all know that singed overflow is undefined.
In gcc there are two options to define/control the behavior of it:
-fwrapv - defines signed overflow as wrapping around, just like unsigned.
-ftrapv - causes a trap exception, could lead to termination of the program.

Now, as the article
  http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
explains in detail, undefined behavior allows certain
optimizations, like loop optimizations, pointer arithmetic etc..
Using -fwrapv globally might lead to a significant slow down.

On the other side, some of these optimization might lead to security
exploits e.g. if they optimize-out checks in the code, that are there to
prevent overflows.

So there is a possible scenario, where some entity that want to secure the
video playback of their browser, would like to enable a bunch of
runtime checking functionality, like -fstack-protector and -fwrapv.

Now, we get to the FFT.

As Rostislav (atomnuker) said, it is not big issue if
overflow happens in fft calculation and produces wrong result.

However if -ftrapv is enabled, it may crash the whole program (!!)
(instead of producing a short noise).

To prevent that Michael changes the code so a signed variable is
converted to unsigned for the operations that could overflow
and converted back to signed for operations that need sign extension.

He is using a new typedef,
so the developers(!) could know that this type contains signed value,
while the type for the compiler(!) is actually unsigned.

Now...
I would be happy if you all could think of a better way to get the same result,
that doesn't involve changing types back and forth.

All I can think of is:
1. Moving the functions to a special file and compiling it with
-fwrapv -fno-trapv.
In the fft case this might be extra hard, as the file is actually a template...
2. Asking gcc for attribute that defines the behavior, for a single
function or code block.
3. Asking gcc for attribute that defines the behavior, for a variable or type.
4. Defining that behavior by the standard committee in next C
standard. Maybe with new standard type. Or making "int32_t" wrap,
while keeping "int" undefined.

Michael solution may look ugly, make the code
harder to understand, but it:
1. Works Now.
2. Does work on all compilers.
3. Doesn't make the code slower.
4. Doesn't complicate the build system.

Of course, as FFmpeg developer, it is your right to initiate a vote
that would prevent Michael from trying to make FFmpeg more secure.
He has always complied with official decisions.

However this might turn into publicity nightmare,
as security community is known to overreact
on certain topics.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-11 Thread Ivan Kalvachev

On 6/10/17, James Darnley  wrote:
> On 2017-06-09 13:41, Ivan Kalvachev wrote:
>> On 6/9/17, Michael Niedermayer  wrote:
>>> seems this breaks build with mingw64, didnt investigate but it
>>> fails with these errors:
>>>
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>> collect2: error: ld returned 1 exit status
>>> collect2: error: ld returned 1 exit status
>>> make: *** [ffmpeg_g.exe] Error 1
>>> make: *** Waiting for unfinished jobs
>>> make: *** [ffprobe_g.exe] Error 1
>>
>>
>> const_*_edge is used on only one place is the code.
>> Would you check if this patch fixes the issue.
>>
>> I expected that the addresses would be pre-calculated
>> by n/yasm as one value and indexed
>> relative to the section start.
>> Instead it seems that each entry is represented with
>> its own address and offset from it.
>> Since the offset is negative it uses all 64 bits and
>> it makes difference if it is truncated to 32 bits.
>>
>> Same issue could happen with clang tools.
>
> The problem is with the relative addressing.  You need to load the real
> address first before you can offset with another register at runtime. So
> something like:
>
>> mov  reg1,  [read_only_const]
lea ?
>> mova mmreg, [reg1 + reg2]

.
OK, Getting mingw for my distro is problem
and compiling one myself would take a bit more effort/time.
So I'm posting a patch that "should" work.
==
--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -406,7 +406,7 @@ align 16
 ; uint32 N  - Number of vector elements. Must be 0 < N < 8192
 ;
 %macro PVQ_FAST_SEARCH 0
-cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
+cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N
 %define tmpX rsp

 ;movsxdifnidn Nq,  Nd
@@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
 add Nq,   r4q   ; Nq = align(Nq, mmsize)
 sub rsp,  Nq; allocate tmpX[Nq]

+%ifdef PIC
+lea r5q,  [const_align_abs_edge]; rip+const
+movups  m3,   [r5q+r4q-mmsize]  ; this is
the bit mask for the padded read at the end of the input
+%else
 movups  m3,   [const_align_abs_edge-mmsize+r4q] ; this is
the bit mask for the padded read at the end of the input
+%endif

 lea r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
up) to mmsize, so r4q can't become negative here, unless N=0.
 movups  m2,   [inXq + r4q]
==

What I find surprising is that PIC is enabled only on Windows and does not seem
to depend on CONFIG_PIC, so textrels are used all over assembly code.
Do I miss something?

Are there option(s) to signal/error when texrel is been used in code
that should be pic ?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH] Opus Piramid Vector Quantization Search in x86 SIMD asm

2017-06-11 Thread Ivan Kalvachev

On 6/11/17, Hendrik Leppkes  wrote:
> On Sun, Jun 11, 2017 at 11:34 AM, Ivan Kalvachev 
> wrote:
>> On 6/10/17, James Darnley  wrote:
>>> On 2017-06-09 13:41, Ivan Kalvachev wrote:
>>>> On 6/9/17, Michael Niedermayer  wrote:
>>>>> seems this breaks build with mingw64, didnt investigate but it
>>>>> fails with these errors:
>>>>>
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x2d):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x3fd):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0x7a1):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> libavcodec/libavcodec.a(opus_pvq_search.o):src/libavcodec/x86/opus_pvq_search.asm:(.text+0xb48):
>>>>> relocation truncated to fit: R_X86_64_32 against `const_align_abs_edge'
>>>>> collect2: error: ld returned 1 exit status
>>>>> collect2: error: ld returned 1 exit status
>>>>> make: *** [ffmpeg_g.exe] Error 1
>>>>> make: *** Waiting for unfinished jobs
>>>>> make: *** [ffprobe_g.exe] Error 1
>>>>
>>>>
>>>> const_*_edge is used on only one place is the code.
>>>> Would you check if this patch fixes the issue.
>>>>
>>>> I expected that the addresses would be pre-calculated
>>>> by n/yasm as one value and indexed
>>>> relative to the section start.
>>>> Instead it seems that each entry is represented with
>>>> its own address and offset from it.
>>>> Since the offset is negative it uses all 64 bits and
>>>> it makes difference if it is truncated to 32 bits.
>>>>
>>>> Same issue could happen with clang tools.
>>>
>>> The problem is with the relative addressing.  You need to load the real
>>> address first before you can offset with another register at runtime. So
>>> something like:
>>>
>>>> mov  reg1,  [read_only_const]
>> lea ?
>>>> mova mmreg, [reg1 + reg2]
>>
>> .
>> OK, Getting mingw for my distro is problem
>> and compiling one myself would take a bit more effort/time.
>> So I'm posting a patch that "should" work.
>> ==
>> --- a/libavcodec/x86/opus_pvq_search.asm
>> +++ b/libavcodec/x86/opus_pvq_search.asm
>> @@ -406,7 +406,7 @@ align 16
>>  ; uint32 N  - Number of vector elements. Must be 0 < N < 8192
>>  ;
>>  %macro PVQ_FAST_SEARCH 0
>> -cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
>> +cglobal pvq_search,4,6,8, mmsize, inX, outY, K, N
>>  %define tmpX rsp
>>
>>  ;movsxdifnidn Nq,  Nd
>> @@ -419,7 +419,12 @@ cglobal pvq_search,4,5,8, mmsize, inX, outY, K, N
>>  add Nq,   r4q   ; Nq = align(Nq, mmsize)
>>  sub rsp,  Nq; allocate tmpX[Nq]
>>
>> +%ifdef PIC
>> +lea r5q,  [const_align_abs_edge]; rip+const
>> +movups  m3,   [r5q+r4q-mmsize]  ; this is
>> the bit mask for the padded read at the end of the input
>> +%else
>>  movups  m3,   [const_align_abs_edge-mmsize+r4q] ; this is
>> the bit mask for the padded read at the end of the input
>> +%endif
>>
>>  lea r4q,  [Nq-mmsize]   ; Nq is rounded up (aligned
>> up) to mmsize, so r4q can't become negative here, unless N=0.
>>  movups  m2,   [inXq + r4q]
>> ==
>>
>> What I find surprising is that PIC is enabled only on Windows and does not
>> seem
>> to depend on CONFIG_PIC, so textrels are used all over assembly code.
>> Do I miss something?
>>
>
> Win32 code is always PIC, independent of what ffmpeg configure thinks
> it should be. :)
> Using the PIC define seems to be the appropriate thing, thats what the
> other asm code does.
>

Yes, but my query is more about
why CONFIG_PIC does not enable asm PIC too?
After all gcc -fpic is supposed to generate code without texrel, isn't it?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Bump major versions of all libraries

2017-09-10 Thread Ivan Kalvachev

On 9/2/17, James Almer  wrote:
[...]
> Notes:
> I have no way to test what effect the removal of XVMC truly has.
> The decoders are removed but unlike libav we have hwaccels that are not
> removed by this. Similarly, the pixfmt is also not removed in our case.
> Commit dcc39ee10e82833ce24aa57926c00ffeb1948198 does a thorough removal
> of the remnants of this functionality, but given the above i don't know
> if that applies to us the same way.
> I assume the hwaccels are meant to stay and work after this, so someone
> that knows this code and functionality and has a system where it can be
> tested should ideally look at this.

I assume that the above commit is from libav?
If so yes, it is the wrong thing to do.

The code has been reworked so player
uses pix_fmt to select acceleration,
thus no special decoders are needed.
It's like other hwaccel formats.

If you want I can send a patch that removes the disabled
XvMC functionality and leaves the enabled one.
If you do it instead, be careful, there are both #ifdef and #ifdef !not's,
afair.

MPlayer has been changed to use the new API years ago,
so it should work (with recompilation, but that's given at API bump).

I cannot test it ATM, because of HW issues.
I'll try to do this shortly.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Bump major versions of all libraries

2017-10-08 Thread Ivan Kalvachev

On 9/10/17, James Almer  wrote:
> On 9/10/2017 2:55 PM, Ivan Kalvachev wrote:
>> On 9/2/17, James Almer  wrote:
>> [...]
>>> Notes:
>>> I have no way to test what effect the removal of XVMC truly has.
>>> The decoders are removed but unlike libav we have hwaccels that are not
>>> removed by this. Similarly, the pixfmt is also not removed in our case.
>>> Commit dcc39ee10e82833ce24aa57926c00ffeb1948198 does a thorough removal
>>> of the remnants of this functionality, but given the above i don't know
>>> if that applies to us the same way.
>>> I assume the hwaccels are meant to stay and work after this, so someone
>>> that knows this code and functionality and has a system where it can be
>>> tested should ideally look at this.
>>
>> I assume that the above commit is from libav?
>> If so yes, it is the wrong thing to do.
>>
>> The code has been reworked so player
>> uses pix_fmt to select acceleration,
>> thus no special decoders are needed.
>> It's like other hwaccel formats.
>>
>> If you want I can send a patch that removes the disabled
>> XvMC functionality and leaves the enabled one.
>> If you do it instead, be careful, there are both #ifdef and #ifdef !not's,
>> afair.
>
> I'd very much prefer if you can do it. Adapt the codebase in a way that
> simply flipping the FF_API_XVMC define from 1 to 0 disables the
> deprecated code (and potentially enables new code in necessary, like the
> stuff in pixfmt.h).
> I have no way to test this functionality so anything i could do would be
> just a wild guess.

I just built it with FF_API_XVMC 0 and it works just as well as before.

The hwaccel changes were done after libav deprecation merges,
so it has always been OK.

There is however unrelated bug that affects both new and old api.
It seems related to idct permutations. I'll try to find a fix asap.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Fix visual glitch with XvMC, caused by wrong idct permutation.

2017-10-08 Thread Ivan Kalvachev

In the past XvMC forced simple_idct since
it was using FF_IDCT_PERM_NONE.
However now we have SIMD variants of simple_idct that
are using FF_IDCT_PERM_TRANSPOSE and if they are selected
XvMC would get coefficients in the wrong order.

The patch creates new FF_IDCT_NONE that
is used only for this kind of hardware decoding
and that fallbacks to the old C only simple idct.

BTW,
If you have idea for a better name, tell me, I might change it.
I thought of FF_IDCT_HWACCEL_PASSTHROUGHT but it is too huge and ugly.
For some reason mpeg12 vdpau also uses the same idct, so using XVMC in
the name doesn't seem right. (I'm not sure why vdpau needs it...)

I also was thinking of using "-1" number for the define, but ...
I didn't want to risk with it.

Best Regards
   Ivan Kalvachev.
From 88a5f15f8ea04a5fb4eb135e1e773f92bb56a6e0 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 9 Oct 2017 01:25:00 +0300
Subject: [PATCH] Fix visual glitch with XvMC, caused by wrong idct
 permutation.

In the past XvMC forced simple_idct since
it was using FF_IDCT_PERM_NONE.
However now we have SIMD variants of simple_idct that
are using FF_IDCT_PERM_TRANSPOSE and if they are selected
XvMC would get coefficients in the wrong order.

The patch creates new FF_IDCT_NONE that
is used only for this kind of hardware decoding
and that fallbacks to the old C only simple idct.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/avcodec.h   | 1 +
 libavcodec/idctdsp.c   | 1 +
 libavcodec/mpeg12dec.c | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 52cc5b0ca..ca0cac501 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -3147,6 +3147,7 @@ typedef struct AVCodecContext {
 #define FF_IDCT_SIMPLEALPHA   23
 #endif
 #define FF_IDCT_SIMPLEAUTO128
+#define FF_IDCT_NONE  129 /* Used by XvMC to extract IDCT coefficients with FF_IDCT_PERM_NONE */
 
 /**
  * bits per sample/pixel from the demuxer (needed for huffyuv).
diff --git a/libavcodec/idctdsp.c b/libavcodec/idctdsp.c
index d596aed1a..45b29d6b7 100644
--- a/libavcodec/idctdsp.c
+++ b/libavcodec/idctdsp.c
@@ -279,6 +279,7 @@ av_cold void ff_idctdsp_init(IDCTDSPContext *c, AVCodecContext *avctx)
 c->perm_type = FF_IDCT_PERM_NONE;
 #endif /* CONFIG_FAANIDCT */
 } else { // accurate/default
+/* Be sure FF_IDCT_NONE will select this one, since it uses FF_IDCT_PERM_NONE */
 c->idct_put  = ff_simple_idct_put_8;
 c->idct_add  = ff_simple_idct_add_8;
 c->idct  = ff_simple_idct_8;
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 22c29c150..4e68be27f 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -1217,7 +1217,7 @@ static void setup_hwaccel_for_pixfmt(AVCodecContext *avctx)
 #endif
 )
 if (avctx->idct_algo == FF_IDCT_AUTO)
-avctx->idct_algo = FF_IDCT_SIMPLE;
+avctx->idct_algo = FF_IDCT_NONE;
 
 if (avctx->hwaccel && avctx->pix_fmt == AV_PIX_FMT_XVMC) {
 Mpeg1Context *s1 = avctx->priv_data;
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Fix crash if av_vdpau_bind_context() is not used.

2017-10-08 Thread Ivan Kalvachev

The public functions av_alloc_vdpaucontext() and
av_vdpau_alloc_context() are allocating AVVDPAUContext
structure that is supposed to be placed in avctx->hwaccel_context.

However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
as struct VDPAUHWContext, that is bigger and does contain
AVVDPAUContext as first member.

The usage includes write to the new variables in the bigger stuct,
without checking for block size.

Fix by always allocating the bigger structure.

BTW,
I have no idea why the new fields haven't simply been added to the
existing struct...
It seems that the programmer who wrote this has been aware of the problem,
because av_vdpau_bind_context reallocates the structure.

It might be good idea to check the other usages of this reallocation function.

Best Regards
   Ivan Kalvachev
From c9dafbf5402ebf8c68bf8648ecea7a74282113a8 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 9 Oct 2017 02:40:26 +0300
Subject: [PATCH] Fix crash if av_vdpau_bind_context() is not used.

The public functions av_alloc_vdpaucontext() and
av_vdpau_alloc_context() are allocating AVVDPAUContext
structure that is supposed to be placed in avctx->hwaccel_context.

However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
as struct VDPAUHWContext, that is bigger and does contain
AVVDPAUContext as first member.

The usage includes write to the new variables in the bigger stuct,
without checking for block size.

Fix by always allocating the bigger structure.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/vdpau.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/vdpau.c b/libavcodec/vdpau.c
index 42ebddbee..4cc51cb79 100644
--- a/libavcodec/vdpau.c
+++ b/libavcodec/vdpau.c
@@ -816,7 +816,7 @@ do {   \
 
 AVVDPAUContext *av_vdpau_alloc_context(void)
 {
-return av_mallocz(sizeof(AVVDPAUContext));
+return av_mallocz(sizeof(VDPAUHWContext));
 }
 
 int av_vdpau_bind_context(AVCodecContext *avctx, VdpDevice device,
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Fix visual glitch with XvMC, caused by wrong idct permutation.

2017-10-09 Thread Ivan Kalvachev

On 10/9/17, Ronald S. Bultje  wrote:
> Hi,
>
> On Sun, Oct 8, 2017 at 6:52 PM, Ivan Kalvachev  wrote:
> [..]
>
> Indentation is off in the second hunk, can you fix that?

You want it 4 spaces to the right
or to start from the first position?

BTW, I think it would be better to use "127" number.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH v2] Fix visual glitch with XvMC, caused by wrong idct permutation.

2017-10-09 Thread Ivan Kalvachev

On 10/9/17, Michael Niedermayer  wrote:
> On Mon, Oct 09, 2017 at 09:02:38AM -0400, Ronald S. Bultje wrote:
>> Hi,
>>
>> On Mon, Oct 9, 2017 at 6:46 AM, Ivan Kalvachev 
>> wrote:
>>
>> > On 10/9/17, Ronald S. Bultje  wrote:
>> > > On Sun, Oct 8, 2017 at 6:52 PM, Ivan Kalvachev 
>> > wrote:
>> > > [..]
>> > >
>> > > Indentation is off in the second hunk, can you fix that?
>> >
>> > You want it 4 spaces to the right

Done.

>>
>> Yes, please.
>>
>> BTW, I think it would be better to use "127" number.
>> >
>>
>> I don't really mind either way. The number 128 suggests it may have been
>> intended as a bitmask. Michael is probably better positioned to comment
>> on
>> this.
>
> I don't really remember but i think 128 was chosen for ABI
> compatibility with additions to it from libav. So it should no longer
> matter what values are used on additions

Then I'm using the next free number "24".

Please, commit when you think it is appropriate.

Best Regards
   Ivan Kalvachev
From 8842a69091b5eb5cf9b704b3ff504d21db4aad9b Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 9 Oct 2017 01:25:00 +0300
Subject: [PATCH] Fix visual glitch with XvMC, caused by wrong idct
 permutation.

In the past XvMC forced simple_idct since
it was using FF_IDCT_PERM_NONE.
However now we have SIMD variants of simple_idct that
are using FF_IDCT_PERM_TRANSPOSE and if they are selected
XvMC would get coefficients in the wrong order.

The patch creates new FF_IDCT_NONE that
is used only for this kind of hardware decoding
and that fallbacks to the old C only simple idct.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/avcodec.h   | 1 +
 libavcodec/idctdsp.c   | 1 +
 libavcodec/mpeg12dec.c | 2 +-
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index 52cc5b0ca..18c3e3ea1 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -3146,6 +3146,7 @@ typedef struct AVCodecContext {
 #if FF_API_ARCH_ALPHA
 #define FF_IDCT_SIMPLEALPHA   23
 #endif
+#define FF_IDCT_NONE  24 /* Used by XvMC to extract IDCT coefficients with FF_IDCT_PERM_NONE */
 #define FF_IDCT_SIMPLEAUTO128
 
 /**
diff --git a/libavcodec/idctdsp.c b/libavcodec/idctdsp.c
index d596aed1a..0122d29ef 100644
--- a/libavcodec/idctdsp.c
+++ b/libavcodec/idctdsp.c
@@ -279,6 +279,7 @@ av_cold void ff_idctdsp_init(IDCTDSPContext *c, AVCodecContext *avctx)
 c->perm_type = FF_IDCT_PERM_NONE;
 #endif /* CONFIG_FAANIDCT */
 } else { // accurate/default
+/* Be sure FF_IDCT_NONE will select this one, since it uses FF_IDCT_PERM_NONE */
 c->idct_put  = ff_simple_idct_put_8;
 c->idct_add  = ff_simple_idct_add_8;
 c->idct  = ff_simple_idct_8;
diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index 22c29c150..4e68be27f 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -1217,7 +1217,7 @@ static void setup_hwaccel_for_pixfmt(AVCodecContext *avctx)
 #endif
 )
 if (avctx->idct_algo == FF_IDCT_AUTO)
-avctx->idct_algo = FF_IDCT_SIMPLE;
+avctx->idct_algo = FF_IDCT_NONE;
 
 if (avctx->hwaccel && avctx->pix_fmt == AV_PIX_FMT_XVMC) {
 Mpeg1Context *s1 = avctx->priv_data;
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Fix crash if av_vdpau_bind_context() is not used.

2017-10-09 Thread Ivan Kalvachev

On 10/9/17, wm4  wrote:
> On Mon, 9 Oct 2017 03:04:53 +0300
> Ivan Kalvachev  wrote:
>
>> The public functions av_alloc_vdpaucontext() and
>> av_vdpau_alloc_context() are allocating AVVDPAUContext
>> structure that is supposed to be placed in avctx->hwaccel_context.
>>
>> However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
>> as struct VDPAUHWContext, that is bigger and does contain
>> AVVDPAUContext as first member.
>>
>> The usage includes write to the new variables in the bigger stuct,
>> without checking for block size.
>>
>> Fix by always allocating the bigger structure.
>>
>> BTW,
>> I have no idea why the new fields haven't simply been added to the
>> existing struct...
>> It seems that the programmer who wrote this has been aware of the problem,
>> because av_vdpau_bind_context reallocates the structure.
>>
>> It might be good idea to check the other usages of this reallocation
>> function.
>>
>> Best Regards
>>Ivan Kalvachev
>
> IMO not really worth fixing at this point, because this is the old-old
> vdpau API. Even av_vdpau_bind_context() (which does not require using
> av_alloc_vdpaucontext()) is deprecated. Or rather should be - I just
> haven't bothered deprecating it because the deprecation dance is too
> messy. In any case, you shouldn't use any of those APIs - use the
> generic hwaccel API instead (setting hw_frames_ctx or hw_device_ctx).

Every bug must be fixed, even if the code is going to be removed next.

Since you "didn't bother" to deprecate it, this code will remain even after
the API bump. And it is still (mis)used by at least one program that
crashed on me.

So it MUST be fixed.

Feel free at any time to mark it as deprecated
and set a deprecation target.


Best Regards
   Ivan Kalvachev
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/proresdec : add qmat dsp with SSE2, AVX2 simd

2017-10-09 Thread Ivan Kalvachev

On 10/9/17, Martin Vignali  wrote:
> 2017-10-07 18:16 GMT+02:00 Ronald S. Bultje :
>
>> Hi Martin,
>>
>> On Sat, Oct 7, 2017 at 11:49 AM, Martin Vignali 
>> wrote:
>>
>> > 2017-10-07 17:30 GMT+02:00 Ronald S. Bultje :
>> > > On Sat, Oct 7, 2017 at 10:22 AM, Martin Vignali <
>> > martin.vign...@gmail.com>
>> > > wrote:
>> > > > Patch in attach add a new dsp
>> > > > for manipulation of qmat
>> > > >
>> > > > for now, i move this code inside
>> > > >
>> > > > for (i = 0; i < 64; i++) {
>> > > > qmat_luma_scaled  [i] = ctx->qmat_luma  [i] * qscale;
>> > > > qmat_chroma_scaled[i] = ctx->qmat_chroma[i] * qscale;
>> > > > }
>> > > >
>> > > > i add a special case for qscale == 1
>> > > > and SSE2, AVX2 optimization
>> > >
>> > > This loop only executes once per slice. We typically do not
>> SIMD-optimize
>> > > at that level, because it won't give significant speed gains...
>> >
>> > Ok didn't know that.
>> > I mostly follow, what there are already done, like in
>> blockdsp.clear_block
>> >
>>
>> Right, so consider that blockdsp is done per block (16x16 pixels), not per
>> slice.
>>
> Ok on principle (only improve, a func which is called quite often)

It's more of:  We can't refuse code that makes a measurable improvement.

Also have in mind that compilers are getting smarter and this code is
good target for auto-vectorization. Of course FFmpeg disables is,
because of long history of compiler bugs related to it.

>> You could remove this entirely from the slice processing code by simply
>> pre-calculating the values in the init function once for the whole stream,
>> there's only 224 qscale values so it's 224*64*2 multiplications, which is
>> (in the context of prores) virtually negligible.
>>
>
> Not sure, we can do that for prores decoder
> the qmat seems to be set on the decode frame header func
> (based on the header of the frame).

You can at least check if the qscale has changed and avoid recalculation.
I think that the lgpl decoder does that.

Best Regards
   Ivan Kalvachev
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Fix crash if av_vdpau_bind_context() is not used.

2017-10-11 Thread Ivan Kalvachev

On 10/10/17, wm4  wrote:
> On Tue, 10 Oct 2017 03:24:56 +0300
> Ivan Kalvachev  wrote:
>
>> On 10/9/17, wm4  wrote:
>> > On Mon, 9 Oct 2017 03:04:53 +0300
>> > Ivan Kalvachev  wrote:
>> >
>> >> The public functions av_alloc_vdpaucontext() and
>> >> av_vdpau_alloc_context() are allocating AVVDPAUContext
>> >> structure that is supposed to be placed in avctx->hwaccel_context.
>> >>
>> >> However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
>> >> as struct VDPAUHWContext, that is bigger and does contain
>> >> AVVDPAUContext as first member.
>> >>
>> >> The usage includes write to the new variables in the bigger stuct,
>> >> without checking for block size.
>> >>
>> >> Fix by always allocating the bigger structure.
>> >>
>> >> BTW,
>> >> I have no idea why the new fields haven't simply been added to the
>> >> existing struct...
>> >> It seems that the programmer who wrote this has been aware of the
>> >> problem,
>> >> because av_vdpau_bind_context reallocates the structure.
>> >>
>> >> It might be good idea to check the other usages of this reallocation
>> >> function.
>> >>
>> >> Best Regards
>> >>Ivan Kalvachev
>> >
>> > IMO not really worth fixing at this point, because this is the old-old
>> > vdpau API. Even av_vdpau_bind_context() (which does not require using
>> > av_alloc_vdpaucontext()) is deprecated. Or rather should be - I just
>> > haven't bothered deprecating it because the deprecation dance is too
>> > messy. In any case, you shouldn't use any of those APIs - use the
>> > generic hwaccel API instead (setting hw_frames_ctx or hw_device_ctx).
>>
>> Every bug must be fixed, even if the code is going to be removed next.
>>
>> Since you "didn't bother" to deprecate it, this code will remain even
>> after
>> the API bump. And it is still (mis)used by at least one program that
>> crashed on me.
>>
>> So it MUST be fixed.
>
> Well, if you insist, feel free to attempt to maintain it, but I won't
> care, even if I make changes to the code under the newer API (about
> which I care). I tried to remove some vdpau code earlier which had
> been deprecated for years, but this was rejected, so why care about
> deprecation or whether an ancient API is actually broken? Not me.

You are not making any sense.

Anyway. Do you have any objection for this patch to be committed?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Fix crash if av_vdpau_bind_context() is not used.

2017-10-12 Thread Ivan Kalvachev

On 10/11/17, Ivan Kalvachev  wrote:
> On 10/10/17, wm4  wrote:
>> On Tue, 10 Oct 2017 03:24:56 +0300
>> Ivan Kalvachev  wrote:
>>
>>> On 10/9/17, wm4  wrote:
>>> > On Mon, 9 Oct 2017 03:04:53 +0300
>>> > Ivan Kalvachev  wrote:
>>> >
>>> >> The public functions av_alloc_vdpaucontext() and
>>> >> av_vdpau_alloc_context() are allocating AVVDPAUContext
>>> >> structure that is supposed to be placed in avctx->hwaccel_context.
>>> >>
>>> >> However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
>>> >> as struct VDPAUHWContext, that is bigger and does contain
>>> >> AVVDPAUContext as first member.
>>> >>
>>> >> The usage includes write to the new variables in the bigger stuct,
>>> >> without checking for block size.
>>> >>
>>> >> Fix by always allocating the bigger structure.
>>> >>
>>> >> BTW,
>>> >> I have no idea why the new fields haven't simply been added to the
>>> >> existing struct...
>>> >> It seems that the programmer who wrote this has been aware of the
>>> >> problem,
>>> >> because av_vdpau_bind_context reallocates the structure.
>>> >>
>>> >> It might be good idea to check the other usages of this reallocation
>>> >> function.
>>> >>
>>> >> Best Regards
>>> >>Ivan Kalvachev
>>> >
>>> > IMO not really worth fixing at this point, because this is the old-old
>>> > vdpau API. Even av_vdpau_bind_context() (which does not require using
>>> > av_alloc_vdpaucontext()) is deprecated. Or rather should be - I just
>>> > haven't bothered deprecating it because the deprecation dance is too
>>> > messy. In any case, you shouldn't use any of those APIs - use the
>>> > generic hwaccel API instead (setting hw_frames_ctx or hw_device_ctx).
>>>
>>> Every bug must be fixed, even if the code is going to be removed next.
>>>
>>> Since you "didn't bother" to deprecate it, this code will remain even
>>> after
>>> the API bump. And it is still (mis)used by at least one program that
>>> crashed on me.
>>>
>>> So it MUST be fixed.
>>
>> Well, if you insist, feel free to attempt to maintain it, but I won't
>> care, even if I make changes to the code under the newer API (about
>> which I care). I tried to remove some vdpau code earlier which had
>> been deprecated for years, but this was rejected, so why care about
>> deprecation or whether an ancient API is actually broken? Not me.
>
> You are not making any sense.
>
> Anyway. Do you have any objection for this patch to be committed?

I'd like this fix to be committed before the 3.4 release.
Anybody?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Fix crash if av_vdpau_bind_context() is not used.

2017-10-12 Thread Ivan Kalvachev

On 10/13/17, Carl Eugen Hoyos  wrote:
> 2017-10-09 2:04 GMT+02:00 Ivan Kalvachev :
>> The public functions av_alloc_vdpaucontext() and
>> av_vdpau_alloc_context() are allocating AVVDPAUContext
>> structure that is supposed to be placed in avctx->hwaccel_context.
>>
>> However the rest of libavcodec/vdpau.c uses avctx->hwaccel_context
>> as struct VDPAUHWContext, that is bigger and does contain
>> AVVDPAUContext as first member.
>>
>> The usage includes write to the new variables in the bigger stuct,
>> without checking for block size.
>>
>> Fix by always allocating the bigger structure.
>
> Patch applied and backported.
>
> Thank you, Carl Eugen

Thank you.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Do not remove HWACCEL flag with FF_API_XVMC

2016-02-15 Thread Ivan Kalvachev

Do not remove HWACCEL flag with FF_API_XVMC since the
flag is supposed to be used as generic one.

---
 libavcodec/avcodec.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index d849765..d2a34d8 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -1027,7 +1027,6 @@ typedef struct RcOverride{
  */
 #define CODEC_CAP_DR1 AV_CODEC_CAP_DR1
 #define CODEC_CAP_TRUNCATED   AV_CODEC_CAP_TRUNCATED
-#if FF_API_XVMC
 /* Codec can export data for HW decoding. This flag indicates that
  * the codec would call get_format() with list that might contain HW
accelerated
  * pixel formats (XvMC, VDPAU, VAAPI, etc). The application can pick
any of them
@@ -1036,7 +1035,6 @@ typedef struct RcOverride{
  * chroma format, resolution etc.
  */
 #define CODEC_CAP_HWACCEL 0x0010
-#endif /* FF_API_XVMC */
 /**
  * Encoder or decoder requires flushing with NULL input at the end in order to
  * give the complete and correct output.
--
From 5d0973a6d1ec8b53d1335bed393bf3e67dc8223a Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 15 Feb 2016 13:16:52 +0200
Subject: [PATCH 1/2] Do not remove HWACCEL flag with FF_API_XVMC since the
 flag is supposed to be used as generic one.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/avcodec.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/libavcodec/avcodec.h b/libavcodec/avcodec.h
index d849765..d2a34d8 100644
--- a/libavcodec/avcodec.h
+++ b/libavcodec/avcodec.h
@@ -1027,7 +1027,6 @@ typedef struct RcOverride{
  */
 #define CODEC_CAP_DR1 AV_CODEC_CAP_DR1
 #define CODEC_CAP_TRUNCATED   AV_CODEC_CAP_TRUNCATED
-#if FF_API_XVMC
 /* Codec can export data for HW decoding. This flag indicates that
  * the codec would call get_format() with list that might contain HW accelerated
  * pixel formats (XvMC, VDPAU, VAAPI, etc). The application can pick any of them
@@ -1036,7 +1035,6 @@ typedef struct RcOverride{
  * chroma format, resolution etc.
  */
 #define CODEC_CAP_HWACCEL 0x0010
-#endif /* FF_API_XVMC */
 /**
  * Encoder or decoder requires flushing with NULL input at the end in order to
  * give the complete and correct output.
-- 
2.6.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Keep the new XVMC pixfmt at the old position.

2016-02-15 Thread Ivan Kalvachev

Keep the new XVMC pixfmt at the old position.
It was moved away to preserve ABI compatibility with the fork.

I'm not really sure this one is better,
so I'm ok with dropping it if somebody objects.

Have in mind, triggering FF_API_XVMC will change a lot of AV_PIX_FMT values,
with or without that patch.

---
 libavutil/pixfmt.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
index c01c057..8cc2ad3 100644
--- a/libavutil/pixfmt.h
+++ b/libavutil/pixfmt.h
@@ -79,6 +79,8 @@ enum AVPixelFormat {
 AV_PIX_FMT_XVMC_MPEG2_MC,///< XVideo Motion Acceleration via
common packet passing
 AV_PIX_FMT_XVMC_MPEG2_IDCT,
 #define AV_PIX_FMT_XVMC AV_PIX_FMT_XVMC_MPEG2_IDCT
+#else /* FF_API_XVMC */
+AV_PIX_FMT_XVMC,  ///< XVideo Motion Acceleration via common
packet passing
 #endif /* FF_API_XVMC */
 AV_PIX_FMT_UYVY422,   ///< packed YUV 4:2:2, 16bpp, Cb Y0 Cr Y1
 AV_PIX_FMT_UYYVYY411, ///< packed YUV 4:1:1, 12bpp, Cb Y0 Y1 Cr Y2 Y3
@@ -277,9 +279,6 @@ enum AVPixelFormat {
 AV_PIX_FMT_BAYER_GBRG16BE, ///< bayer, GBGB..(odd line),
RGRG..(even line), 16-bit samples, big-endian */
 AV_PIX_FMT_BAYER_GRBG16LE, ///< bayer, GRGR..(odd line),
BGBG..(even line), 16-bit samples, little-endian */
 AV_PIX_FMT_BAYER_GRBG16BE, ///< bayer, GRGR..(odd line),
BGBG..(even line), 16-bit samples, big-endian */
-#if !FF_API_XVMC
-AV_PIX_FMT_XVMC,///< XVideo Motion Acceleration via common packet passing
-#endif /* !FF_API_XVMC */
 AV_PIX_FMT_YUV440P10LE, ///< planar YUV 4:4:0,20bpp, (1 Cr & Cb
sample per 1x2 Y samples), little-endian
 AV_PIX_FMT_YUV440P10BE, ///< planar YUV 4:4:0,20bpp, (1 Cr & Cb
sample per 1x2 Y samples), big-endian
 AV_PIX_FMT_YUV440P12LE, ///< planar YUV 4:4:0,24bpp, (1 Cr & Cb
sample per 1x2 Y samples), little-endian
-- 
2.6.4
From 0ef3403ea39045787d7f610130f1b62249f050d1 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 15 Feb 2016 13:38:15 +0200
Subject: [PATCH 2/2] Keep the new XVMC pixfmt at the old position. It was
 moved away to preserve ABI compatibility with the fork.

Signed-off-by: Ivan Kalvachev 
---
 libavutil/pixfmt.h | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/libavutil/pixfmt.h b/libavutil/pixfmt.h
index c01c057..8cc2ad3 100644
--- a/libavutil/pixfmt.h
+++ b/libavutil/pixfmt.h
@@ -79,6 +79,8 @@ enum AVPixelFormat {
 AV_PIX_FMT_XVMC_MPEG2_MC,///< XVideo Motion Acceleration via common packet passing
 AV_PIX_FMT_XVMC_MPEG2_IDCT,
 #define AV_PIX_FMT_XVMC AV_PIX_FMT_XVMC_MPEG2_IDCT
+#else /* FF_API_XVMC */
+AV_PIX_FMT_XVMC,  ///< XVideo Motion Acceleration via common packet passing
 #endif /* FF_API_XVMC */
 AV_PIX_FMT_UYVY422,   ///< packed YUV 4:2:2, 16bpp, Cb Y0 Cr Y1
 AV_PIX_FMT_UYYVYY411, ///< packed YUV 4:1:1, 12bpp, Cb Y0 Y1 Cr Y2 Y3
@@ -277,9 +279,6 @@ enum AVPixelFormat {
 AV_PIX_FMT_BAYER_GBRG16BE, ///< bayer, GBGB..(odd line), RGRG..(even line), 16-bit samples, big-endian */
 AV_PIX_FMT_BAYER_GRBG16LE, ///< bayer, GRGR..(odd line), BGBG..(even line), 16-bit samples, little-endian */
 AV_PIX_FMT_BAYER_GRBG16BE, ///< bayer, GRGR..(odd line), BGBG..(even line), 16-bit samples, big-endian */
-#if !FF_API_XVMC
-AV_PIX_FMT_XVMC,///< XVideo Motion Acceleration via common packet passing
-#endif /* !FF_API_XVMC */
 AV_PIX_FMT_YUV440P10LE, ///< planar YUV 4:4:0,20bpp, (1 Cr & Cb sample per 1x2 Y samples), little-endian
 AV_PIX_FMT_YUV440P10BE, ///< planar YUV 4:4:0,20bpp, (1 Cr & Cb sample per 1x2 Y samples), big-endian
 AV_PIX_FMT_YUV440P12LE, ///< planar YUV 4:4:0,24bpp, (1 Cr & Cb sample per 1x2 Y samples), little-endian
-- 
2.6.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Do not remove HWACCEL flag with FF_API_XVMC

2016-02-15 Thread Ivan Kalvachev

On 2/15/16, Hendrik Leppkes  wrote:
> On Mon, Feb 15, 2016 at 2:13 PM, Ivan Kalvachev 
> wrote:
>> Do not remove HWACCEL flag with FF_API_XVMC since the
>> flag is supposed to be used as generic one.
>>
>
> The flag is unused except by the XVMC decoder, so removing it with it
> seems appropriate.

Should I add it to the other hwaccel decoders?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Do not remove HWACCEL flag with FF_API_XVMC

2016-02-15 Thread Ivan Kalvachev

On 2/15/16, Hendrik Leppkes  wrote:
> On Mon, Feb 15, 2016 at 2:13 PM, Ivan Kalvachev 
> wrote:
>> Do not remove HWACCEL flag with FF_API_XVMC since the
>> flag is supposed to be used as generic one.
>>
>
> The flag is unused except by the XVMC decoder, so removing it with it
> seems appropriate.

Oh, I almost missed it.
Triggering FF_API_XVMC doesn't remove XVMC decoder.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Do not remove HWACCEL flag with FF_API_XVMC

2016-02-15 Thread Ivan Kalvachev

On 2/15/16, Ronald S. Bultje  wrote:
> Hi,
>
> On Mon, Feb 15, 2016 at 8:26 AM, Ivan Kalvachev 
> wrote:
>
>> On 2/15/16, Hendrik Leppkes  wrote:
>> > On Mon, Feb 15, 2016 at 2:13 PM, Ivan Kalvachev 
>> > wrote:
>> >> Do not remove HWACCEL flag with FF_API_XVMC since the
>> >> flag is supposed to be used as generic one.
>> >>
>> >
>> > The flag is unused except by the XVMC decoder, so removing it with it
>> > seems appropriate.
>>
>> Should I add it to the other hwaccel decoders?
>
>
> Would it have a real-world purpose?

Does any of the other Capability flags serve any real-world purpose?
e.g. CAP_DRAW_HORIZ_BAND or  CAP_DR1.

The *_BAND indicates usage of draw_horiz_band callback().
The * _DR1 indicates usage of get_buffer(),
The *_HWACCEL is supposed to indicate usage of get_format() and
availability of certain data, when it is called.

Since the trend is removing codecs that do only hwaccel,
I think it is useful for the application to know that the codec
can do more than software decoding.

Now I see I've missed that there have been a rename and
I actually have to move the name to AV_CODEC_CAP scheme.

I'd like to hear if you still have objections,
before doing v2 of this patch.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Keep the new XVMC pixfmt at the old position.

2016-02-15 Thread Ivan Kalvachev

On 2/15/16, Hendrik Leppkes  wrote:
> On Mon, Feb 15, 2016 at 2:23 PM, Ivan Kalvachev 
> wrote:
>> Keep the new XVMC pixfmt at the old position.
>> It was moved away to preserve ABI compatibility with the fork.
>>
>> I'm not really sure this one is better,
>> so I'm ok with dropping it if somebody objects.
>>
>> Have in mind, triggering FF_API_XVMC will change a lot of AV_PIX_FMT
>> values,
>> with or without that patch.
>>
>
> Seems like a pointless change. Personally I would prefer to move it to
> the bottom instead of having it within the first 10 formats in the
> enum.

I would prefer if all hwaccel formats are grouped together :)

Anyway, dropping it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Do not remove HWACCEL flag with FF_API_XVMC

2016-02-16 Thread Ivan Kalvachev

On 2/15/16, Hendrik Leppkes  wrote:
> On Mon, Feb 15, 2016 at 3:13 PM, Ivan Kalvachev 
> wrote:
>> On 2/15/16, Ronald S. Bultje  wrote:
>>> Hi,
>>>
>>> On Mon, Feb 15, 2016 at 8:26 AM, Ivan Kalvachev 
>>> wrote:
>>>
>>>> On 2/15/16, Hendrik Leppkes  wrote:
>>>> > On Mon, Feb 15, 2016 at 2:13 PM, Ivan Kalvachev 
>>>> > wrote:
>>>> >> Do not remove HWACCEL flag with FF_API_XVMC since the
>>>> >> flag is supposed to be used as generic one.
>>>> >>
>>>> >
>>>> > The flag is unused except by the XVMC decoder, so removing it with it
>>>> > seems appropriate.
>>>>
>>>> Should I add it to the other hwaccel decoders?
>>>
>>>
>>> Would it have a real-world purpose?
>>
>> Does any of the other Capability flags serve any real-world purpose?
>> e.g. CAP_DRAW_HORIZ_BAND or  CAP_DR1.
>>
>> The *_BAND indicates usage of draw_horiz_band callback().
>> The * _DR1 indicates usage of get_buffer(),
>
> DR1 could probably be removed as well, since practically every codec
> we have does it this way now.
> But removing existing things and adding new things are entirely
> different topics.

That's not correct - *_HWACCEL isn't new thing either.

You do understand that blocking this change means
removing existing thing, don't you?

>> The *_HWACCEL is supposed to indicate usage of get_format() and
>> availability of certain data, when it is called.
>>
>> Since the trend is removing codecs that do only hwaccel,
>> I think it is useful for the application to know that the codec
>> can do more than software decoding.
>>
>
> This is hardly useful, because hwaccels do not "just work", for proper
> hwaccel support you need to hardcode the available codecs in your
> application and appropriate conditions for them, so things like
> profile checks, hardware compatibility checks, etc can be implemented.
> A generic "this codec does hwaccel" flag is not very useful here.

Yes, it is not mandatory for decoding.

But that's true for all capabilities flag, including the threading ones.
If your application supports threading you can always use the thread
safe functions.

The capability flags are there to make things simpler.

Currently the only way an application could find that given codec
supports hwaccel
is to decode a frame, wait for get_format() callback and check the pixfmt list.

Ronald proposed to implement public function that would
query if given codec support given hwaccel pixfmt.
It's good, but it is not simple.
(Listing decoders is 400+ codecs * 10 hwaccel types,
traversing 40+ element table for each combination.)

It literally cannot be simpler than using the already existing flag.
The bit is already allocated for that, it is kept unused in the new
AV_CODEC_CAP*

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Remove Derek Buitenhuis from MAINTAINERS

2016-05-20 Thread Ivan Kalvachev

On 5/20/16, Christophe Gisquet  wrote:
> Hi,
>
> 2016-05-20 1:55 GMT+02:00 Lukasz Marek :
>> Is Derek revoked to commit or what? Couldn't he just commit this patch and
>> leave? :P  I was a problem for some people, but I see they still have
>> problems. Let people with problems go away with they problems.
>
> Sorry if you felt ganged up on previously. Hopefully the new Code of
> Conduct will avoid that such situations raise to an unsufferable
> level.
>
> But whatever bad technical blood there have been between the two of
> you, and whoever I may agree technically with, this is uncalled for.
> You're just adding fuel to a tense situation, and causing distress to
> someone.
>
> This type of comment is exactly what should not be allowed by the Code
> of Conduct.

That's adding insult to the injury.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [VOTE] Ban Carl Eugen Hoyos

2016-06-12 Thread Ivan Kalvachev

On 6/12/16, Paul B Mahol  wrote:
> Hi,
>
> As requested in the IRC meeting I hereby request for the
> voting committee to begin voting on whatever to ban Carl
> Eugen Hoyos from mailing list, trac and IRC for 4 months,
> starting after the voting has finished.

I don't remember such thing to have been requested on the IRC meeting.
Would you kindly quote the relevant parts of the logs?

Also, I would like to know on what grounds and
on what charges you request that punishment.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [VOTE] Ban Carl Eugen Hoyos

2016-06-12 Thread Ivan Kalvachev

On 6/13/16, Paul B Mahol  wrote:
> On 6/13/16, Ivan Kalvachev  wrote:
>> On 6/12/16, Paul B Mahol  wrote:
>>> Hi,
>>>
>>> As requested in the IRC meeting I hereby request for the
>>> voting committee to begin voting on whatever to ban Carl
>>> Eugen Hoyos from mailing list, trac and IRC for 4 months,
>>> starting after the voting has finished.
>>
>> I don't remember such thing to have been requested on the IRC meeting.
>> Would you kindly quote the relevant parts of the logs?
>
> It was requested to act because of Carl misbehaviour.
> Logs are available on net.

I was on the meeting and I checked the published logs
before sending my first mail.
That's why I requested that you QUOTE the relevant part.

Let me be blunt. Nobody have requested Carl to be banned,
and definitely not from ML, trac, IRC for 4 months.

Feel free to prove me wrong, by providing the quotes I requested.

>> Also, I would like to know on what grounds and
>> on what charges you request that punishment.
>
> On grounds that he was badmouthing others.

That's way too vague...

1. I'd like to see links and quotes of him doing the things you accuse him of.

2. I'd like to know why we have to ban him for 4 months exactly? Why
ban him from ML, IRC, Trac, but not git?
How did you determined that this punishment is the one that is most
fitting the crimes he has done?

I can give you a lot of repeated incidents where people have badmouthed Carl.
Should we ban them all in a similar way? Months and years after the fact?

Also, If we are going to punish somebody, there should be a due
process before that.
Witch hunts are nasty things.

> Many devs requested punishment.
Did they?

Many people wanted breaking CoC to have consequences.
But I do not remember anybody requesting Carl to be banned for 4 months.
Feel free to prove me wrong, with quotes.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] bans

2016-06-15 Thread Ivan Kalvachev

On 6/15/16, Michael Niedermayer  wrote:
> Hi all
>
> As noone is doing anything about the situation and what is being
> done will not lead anywhere (the vote likely wont lead anywhere as
> likely few would ban a active developer 4 month and then if not
> some will feel injustice prevailed thus
>
> After writing this mail i will
>
> 1. ban carl for 24h from the ML due to
> causing  derek to leave the project. (24h was suggested in the IRC
> meeting)
> I suspect carl saw the merges done by derek as causing more bugs than
> good so he attacked until derek stoped doing the merges.
> The correct course of action would have been a vote about stoping the
> merges or a change to the procedure to reduce the amount of bugs.
> Like maybe a seperate branch where merges can be tested for ~24h before
> being pushed to master ...
> Or maybe more people working on fixing regressions
> As a sidenote, most of the regressions should be fixed by now.
>
> 2. ban derek for ~24h from the ML due to causing lukasz to leave the
> project last year, and due to personal insults on the ML and IRC
> to lukasz and carl.
> As derek is not subscribed to the list ATM, this will be implemented
> by moving him from the accept_these_nonmembers list to the
> reject_these_nonmembers list for ~24h
>
> 3. ban myself for ~24h from the ML because i wrote offensive mails too
> years ago and i doubt none was pivotal in causing someone to leave
>
> Thanks

I don't think that these bans would change anything.

The problem is that they would not address
any of the reasons for getting here.

There's been accumulation of a lot of mud,
and what we have seen so far is not even
the tip of the iceberg.

There are people who sincerely believe that
FFmpeg project would be better without Carl.
That Carl hates LibAV project and that hate is
obstruction to the consolidation between the
two projects.

The negativity turns into hostility and provocations.

When Carl complains that some merge breaks FFmpeg,
he is viewed as attacking LibAV and the merger,
even when he is actually only concerned about bugs in FFmpeg.
Then some people think they are in their right to counterattack.

Somebody described the result as:
"it's years of build up displeasure about
the general behavior of a person."

How do you resolve that?

I have no idea.

People don't remember facts, they remember emotions.
And people refuse to acknowledge making mistake,
rejecting any arguments, based on facts and logic.

This basically means that things would only get worse.
Derek's dramatic departure put quite a pain in hearths of
the people who consider him a friend.
They would not forgive Carl for that.

For these who are relatively new to the project,
the same crisis has happened before in FFmpeg,
but with different people.

It was Mans who left FFmpeg, because of Michael's
alleged violations of rules and savage behavior.
There was a vote to keep or kick Michael and the vote kept him.
Things then got silent for few months, until Mans returned by
takeover of FFmpeg project and servers, supported by
a big portion of developers and the other admins.

Takeover at the moment is not really plausible,
since the admins won't support it.

What is plausible is that hostility would remain.
There will be more incidents involving Carl,
some people would push for stricter punishments.

Likely outcomes are:
1. Carl leaves on his own.
2. Carl is banned permanently.
3. Derek friends leave FFmpeg one by one.
4. Derek friends fork FFmpeg.

The first two variants does look like they are
going to be best for FFmpeg, however things
are never this simple. Once Carl is gone,
hostility would be directed to anybody who
opposes in any way the friends circle.

What happened to lukasz, would happen
to a lot more developers and contributors.

This would inevitably lead to stagnation,
lack of manpower, lack of new features,
increase in bugs, long delays in handling security exploits.
All things that are quite common in LibAV project
and also some of main reasons for FFmpeg
replacing it in Debian.
After all, this is how LibAV came to be,
as a circle of friends who got rid of
undesirable developer(s).

So what do you really want?

Do you want to ruin FFmpeg? Because that's how you ruin a project.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] bans

2016-06-16 Thread Ivan Kalvachev

On 6/16/16, James Almer  wrote:
> On 6/15/2016 8:16 PM, Ivan Kalvachev wrote:
>>Loads of crap
>
> No one, and i mean no one reply to this email.
>
> You will not get a single answer to any question you make. All you'll get is
> counter questions. He will make questions he knows the answers for only to
> read the reply in your own words. And once you reply to said questions, your
> answer will be nitpicked expecting you to focus on said comments until the
> conversation is fully derailed.
> Insisting with your questions will be useless.
>
> This is the guy that in a reply to the vote thread said he wasn't aware the
> subject was mentioned in the IRC meeting [1] while in reality he knew well
> about everything and even acknowledged it, as pointed out by Ronald in a
> subsequent email [2].
> He will lie and he will pretend to be unaware of things, be it to not answer
> your questions or to get you to reply his, starting the cycle i mentioned
> above.
>
> This is is a guy that has since day 1 derailed every single conversation and
> tried to put the aggressor as the victim and the victim as the aggressor,
> installing the idea of secret unjustified feuds and invoking old bullshit
> like
> libav's debacle in a perfect godwin's law fashion.
>
> Not a single discussion in IRC where he was involved went anywhere, and
> neither
> will anything in this thread. You'll get inside a spiral of bullshit with no
> end until you decide to stop feeding the troll disguised as worried
> contributor.
>
> [1] https://ffmpeg.org/pipermail/ffmpeg-devel/2016-June/195306.html
> [2] https://ffmpeg.org/pipermail/ffmpeg-devel/2016-June/195344.html

Too late, you already replied and you have demonstrated practically
the things I've described in the mail.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] bans

2016-06-16 Thread Ivan Kalvachev

On 6/16/16, Michael Niedermayer  wrote:
> On Wed, Jun 15, 2016 at 10:50:51AM -0300, James Almer wrote:
>> On 6/15/2016 10:14 AM, Michael Niedermayer wrote:
>> > Hi all
>> >
>> > As noone is doing anything about the situation and what is being
>> > done will not lead anywhere (the vote likely wont lead anywhere as
>> > likely few would ban a active developer 4 month and then if not
>> > some will feel injustice prevailed thus
>> >
>> > After writing this mail i will
>> >
>> > 1. ban carl for 24h from the ML due to
>> > causing  derek to leave the project. (24h was suggested in the IRC
>> > meeting)
>>
>> This is useless IMO. While four months is too much, 24 hours is
>> insignificant.
>>
>
>> Let's not implement bans without a discussion and a vote first.
>
> I agree but if i do nothing people are unhappy that i did nothing,
> if i talk with people trying to resolve a conflict well, it did not
> work this time. and if i do something else people complain too.
> and its 4 weeks since the incident.
> Doing some symbolic action seemed to make sense until the people
> finish discussions and votes.
> I believe i do not have the authority to hand out real (week+) bans,
> nor would i want that authority. Also i will stay away
> from symbolic bans too, this was intended to improve the situation,
> and i believe it did not achieve that.

One way to resolve the matter as mature people would be
asking Carl to explain his words and actions in civilized manner,
then apologize for going over the board.

It is completely counterproductive to escalate the issue with
attempts for public humiliation and punishment,
without making attempts for addressing the bad behavior
they are supposed to stop.



>> The current vote will probably go nowhere, so a proper discussion thread
>> followed by a vote will have to be made.
>
> Yes,
> its up to the vote committee and the community to decide what should
> be done.
>
>
>>
>> > I suspect carl saw the merges done by derek as causing more bugs than
>> > good so he attacked until derek stoped doing the merges.
>>
>> Which is the shitty behavior that's being discussed about. When you find
>> problems you report them, or help fix them. You don't attack the person
>> working on them.
>>
>> > The correct course of action would have been a vote about stoping the
>> > merges or a change to the procedure to reduce the amount of bugs.
>> > Like maybe a seperate branch where merges can be tested for ~24h before
>> > being pushed to master ...
>> > Or maybe more people working on fixing regressions
>> > As a sidenote, most of the regressions should be fixed by now.
>> >
>> > 2. ban derek for ~24h from the ML due to causing lukasz to leave the
>> > project last year, and due to personal insults on the ML and IRC
>> > to lukasz and carl.
>> > As derek is not subscribed to the list ATM, this will be implemented
>> > by moving him from the accept_these_nonmembers list to the
>> > reject_these_nonmembers list for ~24h
>> >
>> > 3. ban myself for ~24h from the ML because i wrote offensive mails too
>> > years ago and i doubt none was pivotal in causing someone to leave
>>
>> And this is silly. It's old history and nobody requested such action in
>> any what whatsoever. It will only derail the discussion and again, bans
>> without discussion or vote are a big no.
>
> It may be unimportant but the "ban" would never have stopped derek from
> subscribing or subsequently posting a message.
>
> [...]
>
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> I have often repented speaking, but never of holding my tongue.
> -- Xenocrates
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Sponsoring feature for H.264 decoding with scaling factor (1/2, 1/4...) (if possible)

2016-06-19 Thread Ivan Kalvachev

On 6/17/16, Eric Beuque  wrote:
> 2016-06-17 19:16 GMT+02:00 Michael Niedermayer :
>
>> On Fri, Jun 17, 2016 at 05:39:23PM +0200, Eric Beuque wrote:
>> > Hi,
>> >
>> > i'm posting here for a feature that is missing in ffmpeg (or may be i
>> > missed something), which consist of decoding H.264 frame with a scaling
>> > factor of 1/2, 1/4, 1/8...
>> >
>> > I found the parameter lowres, which works well with MJPEG stream but
>> > it's
>> > not working with the H.264 decoder.
>> >
>> > I don't know if it's something possible to implement in the decoder, but
>> if
>> > yes, my compagny agreed to sponsor the feature (depending on the cost of
>> > course), if a developer qualified is interested to do it.
>> >
>> > Is someone know if it is possible, and if it can exist someone
>> > interested
>> > to develop this feature?
>>
>> is it acceptable if the encoder enforces some restrictions on the used
>> features ?
>>
>> most general h264 likely cannot efficiently be decoded in lower
>> resolution with acceptable quality
>> Restricting the used intra modes may make it possible to do it though
>> i dont know what the quality would be but better than without
>> restrictions
>>
>> [...]
>>
>
>
> Actually, the main goal is for motion detection algorithm, so we need small
> resolution and best quality is not required. We just need good estimation
> for the moving pixel.
>
> For now, we decode in H.264, and scale it close 320x240, the then we
> perform motion detection on it. The goal is to speed up the process and
> reduce the memory usage since decoding a 2048x1536 picture lead around a 5
> MB for the decoded image in memory.

Have you tried to extract the motion vectors of the H.264 stream itself?

If the encoder supports more than I-frames, it should do some Motion Estimation.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] libvpx: Enable vp9 alpha encoding

2016-07-02 Thread Ivan Kalvachev

On 7/2/16, Vignesh Venkatasubramanian  wrote:
> On Fri, Jul 1, 2016 at 12:48 PM, Ronald S. Bultje 
> wrote:
>> Hi,
>>
>> On Fri, Jul 1, 2016 at 3:29 PM, Ronald S. Bultje 
>> wrote:
>>
>>> Hi,
>>>
>>> On Fri, Jul 1, 2016 at 3:12 PM, Vignesh Venkatasubramanian <
>>> vigneshv-at-google@ffmpeg.org> wrote:
>>>
 On Fri, Jul 1, 2016 at 11:06 AM, James Almer  wrote:
 > On 7/1/2016 2:53 PM, Ronald S. Bultje wrote:
 >> Hi,
 >>
 >> On Fri, Jul 1, 2016 at 1:40 PM, James Zern <
 jzern-at-google@ffmpeg.org>
 >> wrote:
 >>
 >>> On Fri, Jul 1, 2016 at 10:07 AM, Carl Eugen Hoyos 
 >>> wrote:
 > Do we have decoder support (for either vp8 or vp9) for these
 > files?
 
  No, only encoding and muxing.
 >>>
 >>> Seems like a feature request, but no reason to block this one if the
 >>> vp8 one is here.
 >>
 >>
 >> I'm not sure I have an opinion on this... But it feels strange to
 >> allow
 >> encoding of content we cannot decode. Being ffmpeg, how do we
 >> recommend
 >> people handle the files created with this feature, if not by using
 ffmpeg
 >> itself?

 One plausible reason is that Chrome can decode this. So it will be
 useful for people who already have ffmpeg in their pipelines and want
 to create such files. And like James Almer mentioned, this isn't a
 first. VP8 Alpha has been this way too.
>>>
>>>
>>> The fact that something is the way it is, does not prove that it is
>>> therefore right, or that we should therefore continue doing it that way
>>> in
>>> other cases.
>>>
>>> So you're suggesting that it is perfectly fine for people to use Chrome
>>> as
>>> decoder if FFmpeg is the encoder. What if people don't have Chrome
>>> installed? Or what if they want a way of UI-less batch-processing such
>>> files, e.g. what if a service like Youtube/Vimeo wants to allow upload of
>>> vp8a/vp9a files without invoking Chrome for decoding?
>>>
>>
>> Additional evidence in [1], [2].
>>
>> There absolutely seems to be interest in support for vp8a/vp9a decoding
>> outside Chrome. I'm not saying you should implement it in all multimedia
>> frameworks ever created in human history, but doing it in one of them
>> (e.g.
>> ffmpeg, since it already supports encoding) certainly sounds helpful?
>>
>
> I'm not saying alpha decoder shouldn't ever be implemented in ffmpeg.
> I'm just saying that it shouldn't be a reason to block this patch. :)
> Sorry if i wasn't clear before.

The libvpx is used for both encoding and decoding,
so it should be able to decode alpha planes and
provide them to ffmpeg.

Do you have idea how hard is adding decoding support this way?
Can you do it?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] libvpx: Enable vp9 alpha encoding

2016-07-02 Thread Ivan Kalvachev

On 7/2/16, Ronald S. Bultje  wrote:
> Hi,
>
> On Fri, Jul 1, 2016 at 5:27 PM, Vignesh Venkatasubramanian <
> vigneshv-at-google@ffmpeg.org> wrote:
>
>> On Fri, Jul 1, 2016 at 12:48 PM, Ronald S. Bultje 
>> wrote:
>> > Hi,
>> >
>> > On Fri, Jul 1, 2016 at 3:29 PM, Ronald S. Bultje 
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> On Fri, Jul 1, 2016 at 3:12 PM, Vignesh Venkatasubramanian <
>> >> vigneshv-at-google@ffmpeg.org> wrote:
>> >>
>> >>> On Fri, Jul 1, 2016 at 11:06 AM, James Almer 
>> wrote:
>> >>> > On 7/1/2016 2:53 PM, Ronald S. Bultje wrote:
>> >>> >> Hi,
>> >>> >>
>> >>> >> On Fri, Jul 1, 2016 at 1:40 PM, James Zern <
>> >>> jzern-at-google@ffmpeg.org>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> On Fri, Jul 1, 2016 at 10:07 AM, Carl Eugen Hoyos <
>> ceho...@ag.or.at>
>> >>> >>> wrote:
>> >>> > Do we have decoder support (for either vp8 or vp9) for these
>> files?
>> >>> 
>> >>>  No, only encoding and muxing.
>> >>> >>>
>> >>> >>> Seems like a feature request, but no reason to block this one if
>> the
>> >>> >>> vp8 one is here.
>> >>> >>
>> >>> >>
>> >>> >> I'm not sure I have an opinion on this... But it feels strange to
>> allow
>> >>> >> encoding of content we cannot decode. Being ffmpeg, how do we
>> recommend
>> >>> >> people handle the files created with this feature, if not by using
>> >>> ffmpeg
>> >>> >> itself?
>> >>>
>> >>> One plausible reason is that Chrome can decode this. So it will be
>> >>> useful for people who already have ffmpeg in their pipelines and want
>> >>> to create such files. And like James Almer mentioned, this isn't a
>> >>> first. VP8 Alpha has been this way too.
>> >>
>> >>
>> >> The fact that something is the way it is, does not prove that it is
>> >> therefore right, or that we should therefore continue doing it that way
>> in
>> >> other cases.
>> >>
>> >> So you're suggesting that it is perfectly fine for people to use Chrome
>> as
>> >> decoder if FFmpeg is the encoder. What if people don't have Chrome
>> >> installed? Or what if they want a way of UI-less batch-processing such
>> >> files, e.g. what if a service like Youtube/Vimeo wants to allow upload
>> of
>> >> vp8a/vp9a files without invoking Chrome for decoding?
>> >>
>> >
>> > Additional evidence in [1], [2].
>> >
>> > There absolutely seems to be interest in support for vp8a/vp9a decoding
>> > outside Chrome. I'm not saying you should implement it in all multimedia
>> > frameworks ever created in human history, but doing it in one of them
>> (e.g.
>> > ffmpeg, since it already supports encoding) certainly sounds helpful?
>> >
>>
>> I'm not saying alpha decoder shouldn't ever be implemented in ffmpeg.
>> I'm just saying that it shouldn't be a reason to block this patch. :)
>> Sorry if i wasn't clear before.
>
>
> I totally understand that you would think that, since it means you don't
> have to do anything :).
>
> But there's an issue with this thinking. We're essentially already the
> dumping ground for anything multimedia-related nowadays. After all, we
> maintain it and keep it working (assuming basic tests), things couldn't get
> much easier than that, right? But is it actually useful to anyone? I mean
> not just useful for you, but useful to a wider set of people, at least in
> theory.
>
> If there's no decoder, I would claim that the wider utility of this thing
> is essentially zero. And that's somewhat of a concern.
>
> So, how do we get a decoder? vp8a suggests that just waiting for one to
> spontaneously combust out of thin air just doesn't work. So I'm suggesting
> you provide us with one. It's ok if it uses libvpx instead of ffvp8/9.
> Since vp8a encoding is already in, I won't ask for a vp8a decoder either.
> I'm just asking for a vp9a decoder. It might even be OK if it's implemented
> on top of ffmpeg instead of inside libavcodec (I'm not sure how others feel
> about this), i.e. just something that invokes libavformat to parse a webm
> file, create two decoders to get the yuv and a planes, and then merge them
> together into a yuva420p picture. I'm just asking for something _small_ and
> _simple_ (i.e. not "Chrome") that we can point users to when they ask "how
> do I decode vp9a files".
>
> I asked on IRC (#ffmpeg-devel) and several people concurred:
>
>  jamrial: so … I’m looking for a second opinion here, like, an
> independent one… am I being too hard on these guys for saying “an encoder
> needs a decoder”?
>  BBB: I do tend to agree that in general it goes dec->enc, or both at
> the same time. be it a fully lavc decoder or just utilizing a decoder
> library
>  BBB: no, you're not being hard
>
> So it seems I'm not entirely alone in this opinion within the ffmpeg
> developer community.
>
> Ronald


Roland, aren't you one of the main developers of ffvp9?
I remember you have spent a lot of time making it faster.
You know that codec well, don't you?

Thus my question is:
Why are you wasting your time
blocking things that people want,
instead of implementing the alpha plane support

Re: [FFmpeg-devel] [PATCH] doc: clarify development rules

2017-02-25 Thread Ivan Kalvachev

On 2/25/17, wm4  wrote:
> I'm documenting existing practice.
>
> I'm pulling the "6 months" timeout out of my ass, but I think it's
> pretty suitable.
> ---
>  doc/developer.texi | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/doc/developer.texi b/doc/developer.texi
> index dbe1f5421f..41551498ad 100644
> --- a/doc/developer.texi
> +++ b/doc/developer.texi
> @@ -338,13 +338,21 @@ you applied the patch.
>  When applying patches that have been discussed (at length) on the mailing
>  list, reference the thread in the log message.
>
> -@subheading Always wait long enough before pushing changes
> +@subheading Always wait long enough before pushing changes to code actively
> maintained by others
>  Do NOT commit to code actively maintained by others without permission.
>  Send a patch to ffmpeg-devel. If no one answers within a reasonable
>  time-frame (12h for build failures and security fixes, 3 days small
> changes,
>  1 week for big patches) then commit your patch if you think it is OK.
>  Also note, the maintainer can simply ask for more time to review!
>
> +@subheading Pushing patches without sending them to the mailing list
> +If you're the maintainer of a file, or the file is not actively maintained
> by
> +anyone, or the file is not covered by the MAINTAINERS file, you can just
> push
> +them without asking for permission, and without sending them to
> ffmpeg-devel.
> +This right only applies to developers with git push access, of course.
> +A maintainer is considered not active if he hasn't posted a mail to
> ffmpeg-devel
> +for longer than 6 months, and hasn't pushed a patch in that time period
> himself.
> +
>  @subsection Code
>  @subheading API/ABI changes should be discussed before they are made.
>  Do not change behavior of the programs (renaming options etc) or public

I object on this.

I do prefer all the code to go though the maillist.
Even trivial changes.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-24 Thread Ivan Kalvachev

This is the second version of my work.

Nobody posted any benchmarks, so
the old code remains for this round too.

The proper PIC handling code is included.

Small cosmetics, e.g. using tmpY,
to separate (semantically) from the output outY.

Now the tmpX buffer is fixed at 256*sizeof(float) size
and allocated by "cglobal" entry code.

The biggest changes are in the way distortion is computed.
The old code preferred rsqrt() approximation.
However I discovered that sometimes pulses are assigned
at positions where X[i]==0.0 . Padding is set to 0.0 in order
to avoid assigning pulses in it, and the code sometimes
assigned pulses there.

The new code solves this problem in 2 different ways:

1. checking for X[i]==0.0 and zeroing the numerator,
ensuring that it would never be bigger than p_max.
It's done by 3 more instructions in the "pulses add" inner loop.
(The "pulses sub" already have similar check that is sufficient).
This code is enabled for "USE_APPROXIMATION 2" case.

2. Improving precision.
Since input X[] is normalized,
the distortion is calculated by the formula:
   d=2*(1-Sxy/sqrt(Syy))
where Sxy = Sum X[i]*Y[i]
andSyy = Sum Y[i]*Y[i]

The old code (including C version from opus) calculate it
partially with p=Sxy/sqrt(Syy) or p^2=Sxy*Sxy/Syy .
(It does avoid division by replacing it with 2 multiplications
in a cross, aka a/b < c/d => a*d < c*b)
So p values are closing to 1, if we get p=1 we have perfect match.
The problem is that the more Sxy^2 is closer to Syy,
the less precision we get when (approximating) division.
To avoid that I tried to turn the formula into:
   (sqrt(Syy)-Sxy)/sqrt(Syy)
in order to bring the nominator toward zero.
(As floats are normalized, this improves precision).

After some manual experimentation,
I came with the hacked formula:
(Syy-Sxy*Sxy)*approx(1/Syy)
that gives best results.

The final code uses the sign inverted formula (Sxy*Sxy-Syy)/Syy,
in order to preserve the comparison direction of the code.
Since "USE_APPROXIMATION 1" already uses similar formula
the new hack consists of placing a single subtraction in the inner loop.
Also since the formula is inverted its range starts at -1 and goes up,
so the p_max should start with a big enough negative value.

Approximation method #1 was slower than #2 and used to give same results.
However the improved variant gives result that I thought to be binary
identical to
the C version. (Well I found at least a single case where it is not).

I'm inclined to pick "USE_APPROXIMATION 1" as the default method,
even if #2 might still be faster, just because it provides better trade-off
between precision and speed.

At my benchmarks the pvq_search_sse42 is about 2x the speed of
the current C implementation. The v1 was closer to 2.5x .

I'd be glad to see some benchmarks,
preferably with different defines enabled and disabled,
so I can tune the code for different CPU's.

Best Regards
From 7d875390bb81973b6e4d1fb8ad54594ea3f8d5da Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 1/3] SIMD opus pvq_search implementation v2

---
 libavcodec/opus_pvq.c  |   9 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   1 +
 libavcodec/x86/opus_dsp_init.c |  47 +++
 libavcodec/x86/opus_pvq_search.asm | 628 +
 5 files changed, 688 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..172223e241 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -418,7 +418,13 @@ static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_
 int *y = pvq->qcoeff;
 
 celt_exp_rotation(X, N, blocks, K, spread, 1);
+
+{
+START_TIMER
 gain /= sqrtf(pvq->pvq_search(X, y, K, N));
+STOP_TIMER("pvq_search");
+}
+
 celt_encode_pulses(rc, y,  N, K);
 celt_normalize_residual(y, X, N, gain);
 celt_exp_rotation(X, N, blocks, K, spread, 0);
@@ -947,6 +953,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index 6691494838..9246337360 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -33,8 +33,8 @@
float *lowband_scratch, int fill)
 
 struct CeltPVQ {
-DECLARE_ALIGNED(32, int,   qcoeff  )[176];
-DECLARE_ALIGNED(32, float, hadamard_tmp)[176];
+DECLARE_ALIGNED(32, int,   qcoeff  )[256];
+DECLARE_ALIGNED(32, float, hadamard_tmp)[256];
 
 float (*pvq_search)(float *X, int *y,

Re: [FFmpeg-devel] [PATCH 1/2] x86/vf_blend: add sse and ssse3 extremity functions

2017-06-27 Thread Ivan Kalvachev

On 6/27/17, James Almer  wrote:
> Signed-off-by: James Almer 
> ---
>  libavfilter/x86/vf_blend.asm| 25 +
>  libavfilter/x86/vf_blend_init.c |  4 
>  tests/checkasm/vf_blend.c   |  1 +
>  3 files changed, 30 insertions(+)
>
> diff --git a/libavfilter/x86/vf_blend.asm b/libavfilter/x86/vf_blend.asm
> index 33b1ad1496..25f6f5affc 100644
> --- a/libavfilter/x86/vf_blend.asm
> +++ b/libavfilter/x86/vf_blend.asm
> @@ -286,6 +286,31 @@ BLEND_INIT difference, 3
>  jl .loop
>  BLEND_END
>
> +BLEND_INIT extremity, 8
> +pxor   m2, m2
> +mova   m4, [pw_255]
> +.nextrow:
> +movxq, widthq
> +
> +.loop:
> +movum0, [topq + xq]
> +movum1, [bottomq + xq]
> +punpckhbw   m5, m0, m2
> +punpcklbw   m0, m2
> +punpckhbw   m6, m1, m2
> +punpcklbw   m1, m2
> +psubw   m3, m4, m0
> +psubw   m7, m4, m5
> +psubw   m3, m1
> +psubw   m7, m6
> +ABS1m3, m1
> +ABS1m7, m6

Minor nitpick.

There exists ABS2 that takes 4 parameters and that does
two interleaved ABS1 , that are (hopefully) faster on sse2.
It should generate exactly the same code on ssse3.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-30 Thread Ivan Kalvachev

On 6/26/17, Henrik Gramner  wrote:
> On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev 
> wrote:
>> +%define HADDPS_IS_FAST 0
>> +%define PHADDD_IS_FAST 0
> [...]
>> +haddps  %1,   %1
>> +haddps  %1,   %1
> [...]
>> +   phaddd   xmm%1,xmm%1
>> +   phaddd   xmm%1,xmm%1
>
> You can safely assume that those instructions are always slow and that
> this is virtually never the correct way to use them, so just use the
> shuffle + add method.

Well, there is always hope that they would do a direct implementation.
Or maybe AMD would?
Well, I guess I'll just remove it in the final version.

> You can unconditionally use non-destructive 3-arg instructions
> (without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc
> abstraction layer will automatically insert register-register moves as
> needed.

I'm not sure what code do you have in mind for this comment.
If it is about the HSUMPS , I have to note that both variants
would generate different code (on sse).
The short avx form, would generate "movaps" and "shufps"
that have RW dependency. The sse form avoids that and
allows "movaps" and "shufps" to be executed in parallel.

I do use v prefix on non-destructive avx ops.
Should I prefer the non-prefixed version in such cases?
Should I v-prefix all, including vmovaps ?
(afair vmovaps and movaps differ in preserving/zeroing
of the high part of ymm register, when using xmm.)
I guess I should...

BTW, talking about overload macros.

"cmpps" is the only version of this opcode overloaded by x86asm.
Compilers also define "cmpleps" for "cmp less ps"
where the "less" part is been hardcoded as immediate operand.

Aka, "cmpleps m0, m1" is compiled as
"cmpps xmm0, xmm1, 1".

What complicates matter more is that for 256bit avx,
yasm refuses to accept the short form.
I have to use:
"cmpps m0, m0, m1, 1"
because
"cmpps m0, m1, 1" is turned into
"vcmpps ymm0, ymm1, 1" (line from *.dbg.asm)
and  YASM fails with:
"error: invalid combination of opcode and operands"
(nasm 2.13.1 seems ok).

> I'm a bit doubtful if it's worth the complexity to emulate 256-bit
> integer math using floating-point instruction hacks, especially since
> that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's
> probably fine to simply require AVX2 if you need 256-bit integer SIMD.

Well, if 256bit code is 2x the speed of the 128bit one,
then it might make sense to try the emulation.
Unfortunately, even native avx2 seems slower.

> Be aware that most SSE SIMD instructions are actually implemented as
> x86inc macros and redefining them can have unexpected consequences and
> is therefore discouraged.

The manual says that redefinition is recursive
(aka expanded multiple times in the place of usage)
so the hacked integer code would use
the float overloaded macros (and it does).

Don't worry, I don't intend of leaving this in the final version.
But I'm interested to see how much faster/slower it is on real CPU.

Also, I've noticed that some kinds of int/float registry mixing
could have dramatic effect on speed, so trying the hacked
all float version is also a way to check if I have such bad mix.
(e.g. even pxor m0,m0 ; movaps m0,[const] might be problematic)

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [WIP][PATCH]v3 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-30 Thread Ivan Kalvachev

First, I've removed the hacked formula.

While it seems to improve precision in the synthetic test,
it is not enough to avoid assigning pulses in padded area.
Worse, it also interferes with the special case with
subtraction of pulses from y[i]==0 output. Handling these cases
needs combining two masks and using "blendvps" instead the faster "maxps".
Most of these problems are related to the fact that
the hacked formula result is in the range -1.0 to 0.0
where 0.0 is the perfect match.

Second, fixed the all_float rounding mode,
I got it swapped in v2.

Third, reordered some code. Moved the branch
for exact match of the pre-search, to be first taken.
Load the constants needed in the pulse search
only when they will be used.

Use a macro for conditional loading/relabeling of the constants.
Use similar idea for loading and using PIC register.

Use "smartalign" on NASM, to avoid 15 consequitev 1 byte NOPs.
YASM is smart by default.

I attach a second patch, that is
slightly modified version of atomnuker's code.
It sums the distortions of C and SIMD implementations.

In my test approx#2 seems to be pretty close to the C version,
sometimes better, sometimes worse.
If you find a file with dramatic difference, I'm interested. :D

If there are no more issues, v4 would be cleaned up and ready for review.

Best Regards.
From 17cbb22f1f3a9021332f95e38003b8ac4d714aec Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 1/4] SIMD opus pvq_search implementation v3

---
 libavcodec/opus_pvq.c  |   9 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   1 +
 libavcodec/x86/opus_dsp_init.c |  47 +++
 libavcodec/x86/opus_pvq_search.asm | 651 +
 5 files changed, 711 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..6c7504296d 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -418,7 +418,13 @@ static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_
 int *y = pvq->qcoeff;
 
 celt_exp_rotation(X, N, blocks, K, spread, 1);
+
+{
+START_TIMER
 gain /= sqrtf(pvq->pvq_search(X, y, K, N));
+STOP_TIMER("pvq_search");
+}
+
 celt_encode_pulses(rc, y,  N, K);
 celt_normalize_residual(y, X, N, gain);
 celt_exp_rotation(X, N, blocks, K, spread, 0);
@@ -947,6 +953,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86 && CONFIG_OPUS_ENCODER)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index 6691494838..9246337360 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -33,8 +33,8 @@
float *lowband_scratch, int fill)
 
 struct CeltPVQ {
-DECLARE_ALIGNED(32, int,   qcoeff  )[176];
-DECLARE_ALIGNED(32, float, hadamard_tmp)[176];
+DECLARE_ALIGNED(32, int,   qcoeff  )[256];
+DECLARE_ALIGNED(32, float, hadamard_tmp)[256];
 
 float (*pvq_search)(float *X, int *y, int K, int N);
 
@@ -45,6 +45,7 @@ struct CeltPVQ {
 };
 
 int  ff_celt_pvq_init  (struct CeltPVQ **pvq);
+void ff_opus_dsp_init_x86(struct CeltPVQ *s);
 void ff_celt_pvq_uninit(struct CeltPVQ **pvq);
 
 #endif /* AVCODEC_OPUS_PVQ_H */
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 0dbc46504e..6687fbcd72 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -52,6 +52,7 @@ OBJS-$(CONFIG_APNG_DECODER)+= x86/pngdsp_init.o
 OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
+OBJS-$(CONFIG_OPUS_ENCODER)+= x86/opus_dsp_init.o x86/opus_pvq_search.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
 OBJS-$(CONFIG_MLP_DECODER) += x86/mlpdsp_init.o
diff --git a/libavcodec/x86/opus_dsp_init.c b/libavcodec/x86/opus_dsp_init.c
new file mode 100644
index 00..ddccf6fe9e
--- /dev/null
+++ b/libavcodec/x86/opus_dsp_init.c
@@ -0,0 +1,47 @@
+/*
+ * Opus encoder assembly optimizations
+ * Copyright (C) 2017 Ivan Kalvachev 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOU

[FFmpeg-devel] [WIP][PATCH]v4 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-08 Thread Ivan Kalvachev

This should be the final work-in-progress patch.

What's changed:

1. Removed macros conditional defines. The defaults seems to be
optimal on all machines that I got benchmarks from. HADDPS and PHADDD
are always slower, "BLEND"s are never slower than the emulation.

2. Remove SHORT_SYY_UPDATE. It is always slower.

3. Remove ALL_FLOAT_PRESEARCH, it is always slower. Remove the ugly
hack to use 256bit ymm with avx1 and integer operations.

4. Remove remaining disabled code.

5. Use HADDD macro from "x86util", I don't need the result in all lanes/elements

6. Use v-prefix for all avx code.

7. Small optimization: Move some of the HSUMPS in the K!=0 branch.

8. Small optimization: Instead of pre-calculation 2*Y[i] and then
correcting it on exit, It is possible to use Syy/2 instead in
distortion parameter calculations. It saves few multiplications in
pre-search and sign restore loop. It however gives different
approximation of sqrt(). It's not (consistently) better or worse than
the previous approximation.

9. Using movdqa to load "const_int32_offsets". Wrong type might
explain why directly using mem constants is sometimes faster.

10. Move some code around and do minor tweaks.
---

I do not intend of removing "PRESEARCH_ROUNDING" and
"USE_APPROXIMATION", (while for the latter I think I will remove
method#1, I've left it this time just for consistency").
These defines control the precision and the type of results that the
function generates.
E.g. This function can produce same results as opus functions with
"PRESEARCH_ROUNDING 0".
If you want same results as the ffmpeg improved function, then you
need "approx#0". It uses real division and is much slower on older
cpu's, but reasonably fast on anything recent.

I've left 2 other defines. "CONST_IN_X64_REG_IS_FASTER" and
"STALL_WRITE_FORWARDING".
On Sandy Bridge and laters, "const_in_x64" has always been faster. On
my cpu it is about the same.
On Ryzen the "const_in_x64" was consistently faster in all sse/avx
variants, with about 5%. But not if "stall_write" is enabled too.
Ryzen (allegedly) has no write stalling, but that method alone is just
a few cycles faster (about 0.5% ).

I'd like to see if the changes I've done this time, would affect the
above results.


The code is much cleaner and you are free to nitpick on it.

There is something that I'm not exactly sure if I need it.
The function gets 2 integer parameters, and I am not sure
if I have to sign extend them in 64 bit more, in order to clear
the high 32 bits. These parameters should never be negative, so the
sign is not needed.
All 32bit operands should also clear the high bits.
Still I'm not sure if there is guarantee that
the high bits won't contain garbage.


Best Regards
From 4591ad752d1d111615f320b17ad19d5fad0d91d4 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 1/5] SIMD opus pvq_search implementation v4

---
 libavcodec/opus_pvq.c  |   9 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   1 +
 libavcodec/x86/opus_dsp_init.c |  47 
 libavcodec/x86/opus_pvq_search.asm | 524 +
 5 files changed, 584 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..6c7504296d 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -418,7 +418,13 @@ static uint32_t celt_alg_quant(OpusRangeCoder *rc, float *X, uint32_t N, uint32_
 int *y = pvq->qcoeff;
 
 celt_exp_rotation(X, N, blocks, K, spread, 1);
+
+{
+START_TIMER
 gain /= sqrtf(pvq->pvq_search(X, y, K, N));
+STOP_TIMER("pvq_search");
+}
+
 celt_encode_pulses(rc, y,  N, K);
 celt_normalize_residual(y, X, N, gain);
 celt_exp_rotation(X, N, blocks, K, spread, 0);
@@ -947,6 +953,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86 && CONFIG_OPUS_ENCODER)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index 6691494838..9246337360 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -33,8 +33,8 @@
float *lowband_scratch, int fill)
 
 struct CeltPVQ {
-DECLARE_ALIGNED(32, int,   qcoeff  )[176];
-DECLARE_ALIGNED(32, float, hadamard_tmp)[176];
+DECLARE_ALIGNED(32, int,   qcoeff  )[256];
+DECLARE_ALIGNED(32, float, hadamard_tmp)[256];
 
 float (*pvq_search)(float *X, int *y, int K, int N);
 
@@ -45,6 +45,7 @@ struct CeltPVQ {
 };
 
 int  ff_celt_pvq_ini

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/aacpsdsp_template: Fixes integer overflow in ps_add_squares_c()

2017-07-09 Thread Ivan Kalvachev

On 7/9/17, Ronald S. Bultje  wrote:
> Hi,
>
> On Sat, Jul 8, 2017 at 5:17 PM, Michael Niedermayer 
> wrote:
>
>> On Mon, Jul 03, 2017 at 01:37:09AM +0200, Michael Niedermayer wrote:
>> > On Sun, Jul 02, 2017 at 02:24:53PM +0100, Rostislav Pehlivanov wrote:
>> > > On 2 July 2017 at 03:28, Michael Niedermayer 
>> wrote:
>> > >
>> > > > Fixes: runtime error: signed integer overflow: 1965219850 +
>> > > > 995792909
>> > > > cannot be represented in type 'int'
>> > > > Fixes: part of 2096/clusterfuzz-testcase-minimized-4901566068817920
>> > > >
>> > > > Found-by: continuous fuzzing process https://github.com/google/oss-
>> > > > fuzz/tree/master/projects/ffmpeg
>> > > > Signed-off-by
>> > > > > ffmpeg%0ASigned-off-by>:
>> > > > Michael Niedermayer 
>> > > > ---
>> > > >  libavcodec/aacpsdsp_template.c | 3 ++-
>> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > >
>> > > > diff --git a/libavcodec/aacpsdsp_template.c b/libavcodec/aacpsdsp_
>> > > > template.c
>> > > > index 9e1a95c1a1..2c0afd4512 100644
>> > > > --- a/libavcodec/aacpsdsp_template.c
>> > > > +++ b/libavcodec/aacpsdsp_template.c
>> > > > @@ -26,9 +26,10 @@
>> > > >  #include "libavutil/attributes.h"
>> > > >  #include "aacpsdsp.h"
>> > > >
>> > > > -static void ps_add_squares_c(INTFLOAT *dst, const INTFLOAT
>> (*src)[2], int
>> > > > n)
>> > > > +static void ps_add_squares_c(INTFLOAT *dst_param, const INTFLOAT
>> > > > (*src)[2], int n)
>> > > >  {
>> > > >  int i;
>> > > > +SUINTFLOAT *dst = dst_param;
>> > > >  for (i = 0; i < n; i++)
>> > > >  dst[i] += AAC_MADD28(src[i][0], src[i][0], src[i][1],
>> src[i][1]);
>> > > >  }
>> > > >
>> > > >
>> > > What's the issue with just _not_ fixing it here? It only occurs on
>> fuzzed
>> > > inputs, doesn't crash on any known platform ever, makes the code
>> uglier and
>> > > why? Because some fuzzer is super pedantic.
>> > > Why not fix the fuzzer? Why not just mark this as a false positive,
>> since
>> > > fixing it is pointless from the standpoint of security (you can't
>> exploit
>> > > overflows in transforms or functions like this), and all developers
>> hate it.
>> >
>> > Iam not sure you understand the problem.
>> > signed integer overflows are undefined behavior, undefined behavior
>> > is not allowed in C.
>> >
>> >
>> > Theres also no option to mark anything as false positive.
>> > If the tool makes a mistake, the issue needs to be reported to its
>> > authors and needs to be fixed.
>> > I belive the tool is correct, if someone thinks its not correct, the
>> > place to report this to is likely where the clang sanitizers are
>> > developed.
>> >
>> > I do understand that this is annoying but this is how C works.
>> >
>> > About "doesn't crash on any known platform ever",
>> > "you can't exploit  overflows in transforms or functions like this"
>> >
>> > undefined behavior has bitten people with unexpected results in the
>> > past. for example "if (a+b < a)" to test for overflow, but because the
>> condition
>> > can only be true in case of overflow and overflow isnt allowed in C
>> > the compiler didnt assemble this the way the human thought.
>> >
>> > In the case of ps_add_squares_c(), dst[i] has to increase if iam
>> > not mistaken. In case of overflow it can decrease but overflow is
>> > not allowed so a compiler can prune anything that implies dst[] to be
>> > negative.
>> > dst[] is used downstream in checks of greater / less and in FFMAX
>> > In this code the compiler can assume that the sign bit is clear,
>> > what happens when  the compilers optimizer has false assumtations
>> > i dont know but i know its not good.
>> >
>> > That said even if no such chain of conditions exist its still invalid
>> > code if theres undefined behavior in it
>>
>> Does anyone object to this patch ?
>> Or does anyone have a better idea on how to fix this ?
>> if not id like to apply it
>
>
> I think Rostislav's point is: why fix it, if it can only happen with
> corrupt input? The before and after situation is identical: garbage in,
> garbage out. If the compiler does funny things that makes the garbage
> slightly differently bad, is that really so devilishly bad? It's still
> garbage. Is anything improved by this?

As I have tried to explain before,
you can harden a program by compiling it
with `gcc -ftrapv` .

In such a program the overflow would trap
and in normal case would lead to termination.

Do you want your browser to die by a small bit error in a random clip?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/aacpsdsp_template: Fixes integer overflow in ps_add_squares_c()

2017-07-09 Thread Ivan Kalvachev

On 7/9/17, Ronald S. Bultje  wrote:
> Hi,
>
> On Sun, Jul 9, 2017 at 4:39 AM, Reimar Döffinger 
> wrote:
>
>> On 09.07.2017, at 02:52, "Ronald S. Bultje"  wrote:
>> > On Sat, Jul 8, 2017 at 5:17 PM, Michael Niedermayer
>> 
>> > wrote:
>> >
>> >>
>> >> Does anyone object to this patch ?
>> >> Or does anyone have a better idea on how to fix this ?
>> >> if not id like to apply it
>> >
>> >
>> > I think Rostislav's point is: why fix it, if it can only happen with
>> > corrupt input? The before and after situation is identical: garbage in,
>> > garbage out. If the compiler does funny things that makes the garbage
>> > slightly differently bad, is that really so devilishly bad? It's still
>> > garbage. Is anything improved by this?
>>
>> The way C works, you MUST assume any undefined behaviour can at any point
>> [..] become exploitable.[..] If you don't like that, C is the wrong
>> language to use.
>
>
> I think I've read "the boy who cried wolf" a few too many times to my kids,
> but the form of this discussion is currently too polarizing/political for
> my taste.

So, you stir the pot and then run away.
You ignore technical explanations and you call discussion too political.
And you "imply" that developers are lairs.

Maybe you are not aware what you are going,
but I do assure you, it aint pretty.

I think you owe some people an apology.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 1/3] avcodec/aacdec_template: Fix undefined integer overflow in apply_tns()

2017-07-12 Thread Ivan Kalvachev

On 7/11/17, Michael Niedermayer  wrote:
> On Sun, Jul 02, 2017 at 01:33:16PM +0200, Michael Niedermayer wrote:
>> On Sun, Jul 02, 2017 at 01:14:31PM +0200, wm4 wrote:
>> > On Sun,  2 Jul 2017 04:28:54 +0200
>> > Michael Niedermayer  wrote:
>> >
>> > > Fixes: runtime error: signed integer overflow: -2147483648 -
>> > > 1202286525 cannot be represented in type 'int'
>> > > Fixes: 2071/clusterfuzz-testcase-minimized-6036414271586304
>> > >
>> > > Found-by: continuous fuzzing process
>> > > https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
>> > > Signed-off-by: Michael Niedermayer 
>> > > ---
>> > >  libavcodec/aac_defines.h | 2 ++
>> > >  libavcodec/aacdec_template.c | 5 +++--
>> > >  2 files changed, 5 insertions(+), 2 deletions(-)
>> > >
>> > > diff --git a/libavcodec/aac_defines.h b/libavcodec/aac_defines.h
>> > > index 3c79a8a4a1..ee4c73a87d 100644
>> > > --- a/libavcodec/aac_defines.h
>> > > +++ b/libavcodec/aac_defines.h
>> > > @@ -35,6 +35,7 @@
>> > >  #define AAC_RENAME(x)   x ## _fixed
>> > >  #define AAC_RENAME_32(x)x ## _fixed_32
>> > >  typedef int INTFLOAT;
>> > > +typedef unsignedSUINTFLOAT;
>> > >  typedef int64_t INT64FLOAT;
>> > >  typedef int16_t SHORTFLOAT;
>> > >  typedef SoftFloat   AAC_FLOAT;
>> > > @@ -83,6 +84,7 @@ typedef int AAC_SIGNE;
>> > >  #define AAC_RENAME(x)   x
>> > >  #define AAC_RENAME_32(x)x
>> > >  typedef float   INTFLOAT;
>> > > +typedef float   SUINTFLOAT;
>> >
>> > Not more of this damn shit.
>>
>> i dont think i understand your comment
>>
>> The code is templated and uses largely the INTFLOAT data type
>> which is either signed int or float depending on if the code is build
>> for the fixed point or floating point decoder
>>
>> to fix the undefined behavior in the fixed point decoder a type which
>> is unsigned int is the obvious choice.
>> Such type must be float in the floating point decoder.
>>
>> This patch adds such type.
>>
>> do you object to fixing the issue ?
>> do you want to suggest a different solution ?
>
> over a week passed, noone replied.
> Is everyone ok with patch 1/3 ?
> does someone object to it ?
> does anyone have a better solution ?
>
> If noone replies, i will apply this patch, i do not want to leave
> undefined behavior in the codebase.

I actually would request a short note explaining the SUINTFLOAT type usage.
Something like:
+typedef unsignedSUINTFLOAT; // Equivalent to INTFLOAT,
Used as temporal cast to avoid undefined sign overflow operations.

Maybe add such note to all "signed value in unsigned type" typedefs.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH]v5 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-22 Thread Ivan Kalvachev

This patch is ready for review and inclusion.

Explanation of what it does and how it works
could be found in the previous WIP threads:
[v1] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
[v2] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
[v3] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
[v4] http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html

The changes compared to WIP v4 are small:
 - Using r4d ops to clear the high bits on int32 arguments.
 - Correctly map the cglobal registry usage.
 - Use SSE4 instead of SSE42, since blend is only SSE4.1.
 - Fix building with --disable-x86asm .

 - Remove testing defines.
Loading constants in registers is (now) always same or better speed.
Avoiding stall forwarding is faster on all CPU except Ryzen.
On Ryzen the alternative is about 7 cycles faster, that's why
I've left the code disabled, but without define.
I've also left the two other defines, as they are useful
for debugging and creating binary identical results
to other algorithms.

 - Disable the 256bit AVX2 variant usage.
I'm leaving the code in the assembly as disabled,
in case it is useful in future.

---

I'm including some of the benchmarks.
Some data is removed, since it was used to test different methods.
Benchmarks are done at default settings (96kbps),
but with different samples. All samples are above 1h long.

In summary, the function is about 2-3x faster
than the improved FFmpeg C version.

===
 K10  AMD Phenom(tm) II X4 945 Processor
//v4
  706   706   706   706   706// NULL
 4146  4161  4169  4184  4188  4328 4379 // SSE2
 4988  5015  5016  5030  5185// USE_APPROXIMATION  0
13860 13828 13846 13846 13831// C

===
 Pentium Dual Core E5800
//V4
3006 3012 3019 3023 3025 // SSE2
9066 9071 9074 9077 9081 // C

//===
 Ryzen 1800X
//v3
 357 // NULL
1999 2001 2004   // AVX1 GCC
2010 2029// SSE4 MSVC
2012 2026 2027   // AVX1 MSVC
2166 2170 2171   // AVX2 & STALL_WRITE_FORWARDING 1
2176 2179 2180 2180 2189 // AVX2
2226 2230 2234   // AVX2 & USE_APPROXIMATION 0
6216 6162 6162   // C only GCC
   61909 61545   // C only MSVC
//v4
1931 1933 1935   // v4 AVX1
2096 2097 2098   // v4 AVX2 & STALL_WRITE_FORWARDING 1
2103 2110 2112   // v4 AVX2

//===
 Intel(R) Core(TM) i7-3930K CPU
//v3
 272// NULL
1755 1756 1764  // AVX1
1847 1855 1866  // SSE4
2003 2009   // USE_APPROXIMATION  00
2103 2110 2112  // AVX2
4855 4856   // C only

//===
 SkyLake i7 6700HQ
//v2
 264// NULL
1764 1765 1772 1773 1780// SSE4
1782 1782 1787 1795 1796// AVX1
1805 1807 1807 1811 1815// AVX1 & USE_APPROXIMATION 0
1826 1827 1828 1833 1833// SSE2
1850 1853 1857 1857 1868// AVX2
6878 6934 6879 6921 6899// C

-b:a 48kbps, 96kbps, 510kbps
sse4:  2049,   1826, 955
sse2:  2065,   1874, 943
avx:   2106,   1868, 950
c: 9202,   7080,1392
From 522ed9d47db739c9c0559f4c040ef5262c685335 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 1/5] SIMD opus pvq_search implementation

Explanation on the workings and methods used by the
Pyramid Vector Quantization Search function
could be found in the following Work-In-Progress mail threads:
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/opus_pvq.c  |   3 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   2 +
 libavcodec/x86/opus_dsp_init.c |  43 +++
 libavcodec/x86/opus_pvq_search.asm | 554 +
 5 files changed, 605 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..3aa502929c 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -947,6 +947,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86 && CONFIG_OPUS_ENCODER)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq

Re: [FFmpeg-devel] [PATCH]v5 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-24 Thread Ivan Kalvachev

On 7/23/17, Rostislav Pehlivanov  wrote:
> On 22 July 2017 at 12:18, Ivan Kalvachev  wrote:
>
>> This patch is ready for review and inclusion.
>>
>>
>>
>>+%macro emu_blendvps 3 ; dst/src_a, src_b, mask
>>+%macro lea_pic_base 2; reg, const_label
> Capitalize macro names.

Done for the later.
About the former...

I do not like to use capitalization for the emulated instructions:

1. In C macros are always in capital letters to separate them from the
functions.
In ASM functions cannot be mistaken for macros.

2. All instructions are lowercase, even the ones that are overloaded
by x86asm.h macros.

3. There are already about 30 macros with lower cases in
libavcodec/x86. The rule is not absolute.

4. There are some emulated instructions in x86util, that are all upper
case names and
I do find them very ugly when I see them in the code.
This is why I went with "emu_" route.
I actually want to encourage using the emu_ for them too (as
alternative names).

5. There is nothing guaranteeing that the assembler
should be case sensitive.
Actually nasm manual mentions that MASM (on which
it it is based on) is not case-sensitive (by default).
While nasm and yasm are case sensitive,
there are still %idefine and %imacro that
could create some confusion.

Anyway.

Would you accept a change where the emulation macros
are moved to a separate file and the macros are
renamed to EMU_ , aka EMU_blendvps ?
I'm thinking of "libavcodec/x86/x86emu.asm".

Or maybe they should be put in libavutil/x86/x86util.asm ,
however that could open a new can of worms.
AKA using the ugly uppercase only names.

>>+  %error sse41 blendvps uses xmm0 as default 3d operand, you used %3
> Remove this error

I'd like to keep that one.
It helps finding out why the code does not compile.

Sometimes it may not be obvious, since SWAP might
change the registers under the hood.

>
>>+%error "blendvps" emulation needs SSE
> Definitely remove this error too.

Done

>>+%error AVX1 does not support integer 256bit ymm operations
> And this one is particularly pointless...

On the contrary. AVX1 does support 256 bit ymm float point operations
but not integer ones. It is quite trivial mistake to use one with the other.

What is worse, without this the wrong code would compile
without any hint of error, because avx2 codes are
not overloaded by the current x86asm.h
so you won't get warning that you are using avx2 ops in avx1 section.

>>+  %error sse41 blendvps uses xmm0 as default 3 operand, you used %3
> Same...

Again, I'd like to keep that one.

>>+%error "broadcastss" emulation needs SSE
> Same...

Done

>>+%error "pbroadcastd" emulation needs SSE2
> Yep, the same...

Done

>
>>+%error pick HSUMPS implementation
> Again...

I thought I had taken care of that.
Done

> All of these are pointless and unnecessary. Always assume at least SSE2.

NEVER assume anything (that you can check).
Making wrong assumptions is the easiest way
to make a mistakes that are
very hard to notice, find and correct.

>>+
>>+; 256bit version seems slower than the 128bit AVX on all current CPU's
>>+; So disable it for now and keep the code in case something changes in
> future.
>>+%if HAVE_AVX2_EXTERNAL > 0 && 0
>>+INIT_YMM avx2
>>+PVQ_FAST_SEARCH
>>+%endif
>
> Remove this altogether.

I think this would be a mistake.

BTW, would you do a proper benchmarks with the current
code and post the results. There is a chance that the
code has improved.
(Please, use a real audio sample, not generated noise).

I also asked you to try something (changing "cmpless" to "cmpps" ),
that you totally forgot to do.
(IACA says there is no penalty in my code for using this SSE op in AVX2 block,
but as I said, never assume.)

>>+%if 1
>>+emu_pbroadcastd m1,   xmm1
> ...
>>+%else
>>+; Ryzen is the only CPU at the moment that doesn't have
>>+; write forwarding stall and where this code is faster
>>+; with about 7 cycles on avarage.
>>+%{1}ps  m5,   mm_const_float_1
>>+movss   [tmpY + r4q], xmm5  ; Y[max_idx] +-= 1.0;
>
> Remove the %else and always use the first bit of code.

I don't like removing code that is faster on a new CPU.
But since you insist.
Done.

>>+%if cpuflag(avx2) && 01
>>+%elif cpuflag(avx) && 01
>>+%if cpuflag(avx2) && 01
>
> Remove the && 01 in these 3 places.

Done

>>+;  vperm2f128   %1,   %1,   %1,   0 ; ymm, ymm, ymm ; 2-3c 1/1
>
> Remove this commented out code.

Done

>
>>+cglobal pvq_search, 4, 5+num_pic_regs, 11, 256*4, inX, outY, K, N
>>+%if ARCH_X86_64 == 0;sbrdsp
>

[FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-26 Thread Ivan Kalvachev

On 7/24/17, Ivan Kalvachev  wrote:
> On 7/23/17, Rostislav Pehlivanov  wrote:
>> On 22 July 2017 at 12:18, Ivan Kalvachev  wrote:
>>
>>> This patch is ready for review and inclusion.
>>>
>>>
>>>
>>>+%macro emu_blendvps 3 ; dst/src_a, src_b, mask
>>>+%macro lea_pic_base 2; reg, const_label
>> Capitalize macro names.
>
> Done for the later.
> About the former...
>
> I do not like to use capitalization for the emulated instructions:
>
> 1. In C macros are always in capital letters to separate them from the
> functions.
> In ASM functions cannot be mistaken for macros.
>
> 2. All instructions are lowercase, even the ones that are overloaded
> by x86asm.h macros.
>
> 3. There are already about 30 macros with lower cases in
> libavcodec/x86. The rule is not absolute.
>
> 4. There are some emulated instructions in x86util, that are all upper
> case names and
> I do find them very ugly when I see them in the code.
> This is why I went with "emu_" route.
> I actually want to encourage using the emu_ for them too (as
> alternative names).
>
> 5. There is nothing guaranteeing that the assembler
> should be case sensitive.
> Actually nasm manual mentions that MASM (on which
> it it is based on) is not case-sensitive (by default).
> While nasm and yasm are case sensitive,
> there are still %idefine and %imacro that
> could create some confusion.
>
> Anyway.
>
> Would you accept a change where the emulation macros
> are moved to a separate file and the macros are
> renamed to EMU_ , aka EMU_blendvps ?
> I'm thinking of "libavcodec/x86/x86emu.asm".
>
> Or maybe they should be put in libavutil/x86/x86util.asm ,
> however that could open a new can of worms.
> AKA using the ugly uppercase only names.
>
>>>+  %error sse41 blendvps uses xmm0 as default 3d operand, you used %3
>> Remove this error
>
> I'd like to keep that one.
> It helps finding out why the code does not compile.
>
> Sometimes it may not be obvious, since SWAP might
> change the registers under the hood.
>
>>
>>>+%error "blendvps" emulation needs SSE
>> Definitely remove this error too.
>
> Done
>
>>>+%error AVX1 does not support integer 256bit ymm operations
>> And this one is particularly pointless...
>
> On the contrary. AVX1 does support 256 bit ymm float point operations
> but not integer ones. It is quite trivial mistake to use one with the
> other.
>
> What is worse, without this the wrong code would compile
> without any hint of error, because avx2 codes are
> not overloaded by the current x86asm.h
> so you won't get warning that you are using avx2 ops in avx1 section.
>
>>>+  %error sse41 blendvps uses xmm0 as default 3 operand, you used %3
>> Same...
>
> Again, I'd like to keep that one.
>
>>>+%error "broadcastss" emulation needs SSE
>> Same...
>
> Done
>
>>>+%error "pbroadcastd" emulation needs SSE2
>> Yep, the same...
>
> Done
>
>>
>>>+%error pick HSUMPS implementation
>> Again...
>
> I thought I had taken care of that.
> Done
>
>> All of these are pointless and unnecessary. Always assume at least SSE2.
>
> NEVER assume anything (that you can check).
> Making wrong assumptions is the easiest way
> to make a mistakes that are
> very hard to notice, find and correct.
>
>>>+
>>>+; 256bit version seems slower than the 128bit AVX on all current CPU's
>>>+; So disable it for now and keep the code in case something changes in
>> future.
>>>+%if HAVE_AVX2_EXTERNAL > 0 && 0
>>>+INIT_YMM avx2
>>>+PVQ_FAST_SEARCH
>>>+%endif
>>
>> Remove this altogether.
>
> I think this would be a mistake.
>
> BTW, would you do a proper benchmarks with the current
> code and post the results. There is a chance that the
> code has improved.
> (Please, use a real audio sample, not generated noise).
>
> I also asked you to try something (changing "cmpless" to "cmpps" ),
> that you totally forgot to do.
> (IACA says there is no penalty in my code for using this SSE op in AVX2
> block,
> but as I said, never assume.)
>
>>>+%if 1
>>>+emu_pbroadcastd m1,   xmm1
>> ...
>>>+%else
>>>+; Ryzen is the only CPU at the moment that doesn't have
>>>+; write forwarding stall and where this code is faster
>>>+; with about 7 cycles on avarage.
>>>+%{1}ps  m5,   mm_const_float_1
>>>

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-27 Thread Ivan Kalvachev

On 7/27/17, Rostislav Pehlivanov  wrote:
> On 26 July 2017 at 15:56, Ivan Kalvachev  wrote:
>
>> +if (ARCH_X86 && CONFIG_OPUS_ENCODER)
>> +ff_opus_dsp_init_x86(s);
>>
>
> Just change it to
> +if (ARCH_X86)
>
> The init function is named opus_dsp, so it'll get used to other opus
> things, not just the encoder.

But at the moment it does not.
I do prefer to leave that task for the one that
adds opus decoder functions.

Also this change alone would break compilation, since
it also requires changing the libavcodec/x86/Makefile
and adding the guard inside the opus_dsp_init.c

Another option is to have "opus_enc_dsp_init.c" and call
the function "ff_opus_enc_dsp_init_x86()".

Do tell me which option do you prefer
and do you insist on v7 just for that.

> The assembly code looks fine to me, but other people will have to take a
> look at it in case I'm missing something.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-27 Thread Ivan Kalvachev

On 7/27/17, Rostislav Pehlivanov  wrote:
> On 27 July 2017 at 09:38, Ivan Kalvachev  wrote:
>
>> On 7/27/17, Rostislav Pehlivanov  wrote:
>> > On 26 July 2017 at 15:56, Ivan Kalvachev  wrote:
>> >
>> >> +if (ARCH_X86 && CONFIG_OPUS_ENCODER)
>> >> +ff_opus_dsp_init_x86(s);
>> >>
>> >
>> > Just change it to
>> > +if (ARCH_X86)
>> >
>> > The init function is named opus_dsp, so it'll get used to other opus
>> > things, not just the encoder.
>>
>> But at the moment it does not.
>> I do prefer to leave that task for the one that
>> adds opus decoder functions.
>>
>> Also this change alone would break compilation, since
>> it also requires changing the libavcodec/x86/Makefile
>> and adding the guard inside the opus_dsp_init.c
>>
>> Another option is to have "opus_enc_dsp_init.c" and call
>> the function "ff_opus_enc_dsp_init_x86()".
>>
>> Do tell me which option do you prefer
>> and do you insist on v7 just for that.
>>
>> > The assembly code looks fine to me, but other people will have to take a
>> > look at it in case I'm missing something.
>
> The former, but that can be changed later after pushing

Here is the patch.
I'll merge it in v7, if there is one.

Please note that makefile needs to use two separate
config_opus_decoder/encoder, since there is no config_opus_codec . All
other dsp seem to use separate files for encoder and decoder dsp.

Best Regards.
From 1a6ee9b2880c67db25737a6317f09cbbac441c83 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 27 Jul 2017 14:21:33 +0300
Subject: [PATCH 2/6] Build opus dsp for encoder and decoder.

---
 libavcodec/opus_pvq.c  | 2 +-
 libavcodec/x86/Makefile| 1 +
 libavcodec/x86/opus_dsp_init.c | 2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 3aa502929c..2fb276099b 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -947,7 +947,7 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
-if (ARCH_X86 && CONFIG_OPUS_ENCODER)
+if (ARCH_X86)
 ff_opus_dsp_init_x86(s);
 
 *pvq = s;
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 9875f48797..e36644c72a 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -52,6 +52,7 @@ OBJS-$(CONFIG_APNG_DECODER)+= x86/pngdsp_init.o
 OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
+OBJS-$(CONFIG_OPUS_DECODER)+= x86/opus_dsp_init.o
 OBJS-$(CONFIG_OPUS_ENCODER)+= x86/opus_dsp_init.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
diff --git a/libavcodec/x86/opus_dsp_init.c b/libavcodec/x86/opus_dsp_init.c
index f4c25822db..c51f786ee8 100644
--- a/libavcodec/x86/opus_dsp_init.c
+++ b/libavcodec/x86/opus_dsp_init.c
@@ -32,6 +32,7 @@ av_cold void ff_opus_dsp_init_x86(CeltPVQ *s)
 {
 int cpu_flags = av_get_cpu_flags();
 
+#if CONFIG_OPUS_ENCODER
 if (EXTERNAL_SSE2(cpu_flags))
 s->pvq_search = ff_pvq_search_sse2;
 
@@ -40,4 +41,5 @@ av_cold void ff_opus_dsp_init_x86(CeltPVQ *s)
 
 if (EXTERNAL_AVX(cpu_flags))
 s->pvq_search = ff_pvq_search_avx;
+#endif
 }
-- 
2.13.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] build: Allow libffmpeg to be built for Chromium-based browsers

2017-07-28 Thread Ivan Kalvachev

On 7/29/17, Dominik 'Rathann' Mierzejewski  wrote:
> On Saturday, 29 July 2017 at 00:20, Hendrik Leppkes wrote:
>> On Fri, Jul 28, 2017 at 12:07 PM, James Le Cuirot 
>> wrote:
> [...]
>> > This Makefile snippet allows libffmpeg to be created without the help
>> > of Chromium's build system. It uses the CONFIG_SHARED variable to
>> > decide whether to link the FFmpeg libraries statically or
>> > dynamically. In the latter case, libffmpeg is just a wrapper with no
>> > symbols of its own.
> [...]
>> I don't think ffmpeg is the right place to maintain special makefiles
>> for Chromium.
>
> I concur. Instead, Chromium should be fixed to link against individual
> FFmpeg libraries properly.

To be honest, I do not find the combined library such a bad idea.

Actually I think that going this way has been discussed before,
long before Chromium existed.

For example, if you have to dlopen ffmpeg libs, it takes a lot of care
to load all the interlinked libraries in the correct order.
And imagine the fun if you have multiple versions installed.

Having one front-end to load all correct libraries is
logical improvement and simplification.
This is why Chromium has done it on their own.
This is why FFmpeg should do it too.

Also, the changes are isolated to a separate file in the build system,
it should not interfere with normal build process or merges.

I do however object on hardcoding "chromium" codepaths in it.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-01 Thread Ivan Kalvachev

On 7/31/17, Henrik Gramner  wrote:
> On Wed, Jul 26, 2017 at 4:56 PM, Ivan Kalvachev 
> wrote:
>> +++ b/libavcodec/x86/opus_pvq_search.asm
>
> Generic minor stuff:
>
> Use rN instead of rNq for numbered registers (q suffix is used for
> named args only due to preprocessor limitations).

Done.

Is this documented?

> Use the same "standard" vertical alignment rules as most existing
> code, e.g. instructions indented by 4 spaces and operands aligned on
> the first comma.

What do you mean operands aligned on the first comma?
The longest operand I had is "xmm0" , counting comma and space
I get 6 position alignment for operands.
Now, with "xm7" i can lower this to 5 position. Should I do that?
(I don't have "xm15" ).

Also, I've aligned the operand at 12 positions after their
instruction start, because this is the size of the longest
real instruction.

As for why i have instruction start at 8'th position.
It's because I've allocated the field at position 4-7
for instruction prefixes, and the "EMU_" is 4 positions.

Now I have:
1234567812345_12345_12345_
EMU_broadcastss ym13, xm10
EMU_blendvpsxm1,  xm2,  m3
   vblendvps
blendvps

I can ditch the prefix and shorten the operands. e.g. :
1234_1234_1234_1234_
VBROADCASTSS ym1, xm1
BLENDVPS m1,  m2,  m3

Is that acceptable? Or maybe you do mean:

1234_1234_1234_123
VBROADCASTSS ym1, xm1
BLENDVPS  m1, m2, m3

However I would prefer to use
1234_1234_1234_123
VBROADCASTSS ym1, xm1
BLENDVPS  m1,  m2,  m3
PBLENDVD xm1, xm2, xm3

(You do need fixed width font to see that correctly).

I'll wait for reply before doing that.

> Use xmN instead of xmmN (only really makes a difference when SWAP:s
> are used, but might as well do it "correctly").

Done.

> Use 32-bit operands, e.g. rNd, when 64-bit math isn't required.

Done

> Unless aligning every single loop entry helps a lot I'd avoid it since
> it does waste a bit of icache.

I'll redo my benchmarks, but I have placed these aligns for a reason.
At some point removed debug instructions started making my code
slower. So I've placed align to avoid random slowdowns.

> Explicitly using the carry flag as a branch condition is a bit weird.
> Generally using jg/jl/jge/jle tends to be clearer.

I use it to take advantage of the so called MacroFusion.
It allows the merge of cmp and jxx, as long as the branch
checks only one flag, so all signed branches are to be avoided
(as stated by intel manual).
The newer the cpu, the more opcodes (add/sub)
could be merged and less limitations.

>> +%ifdef __NASM_VER__
>> +%use "smartalign"
>> +ALIGNMODE p6
>> +%endif
>
> Assembler-specific ifdeffery should be avoided in individual files.
> Something equivalent already exists in x86inc actually but I don't
> remember if it was merged to FFmpeg from upstream (x264) yet.

Definitely not merged.

I hear a lot about the improved x86inc,
maybe it is time for you to merge it in FFmpeg?

>> +const_int32_offsets:
>> +%rep 8
>> +dd $-const_int32_offsets
>> +%endrep
>
> Isn't this just an overly complicated way of expressing "dd 0*4, 1*4,
> 2*4, 3*4, 4*4, 5*4, 6*4, 7*4"?

Yes.
Do you know a way that works with "times 8"?
I've done it this way to make it easy to change the size of the constant.

Do you request I change it?

>> +SECTION .text
>> +
>> +
>> +
>> +
>
> Reduce some of the excessive whitespace.

Will do with the other cosmetics.

>> +%macro HSUMPS 2 ; dst/src, tmp
>> +%if cpuflag(avx)
>> +  %if sizeof%1>=32  ; avx
>> +   vperm2f128   %2,   %1,   %1,   (0)*16+(1)   ;
>> %2=lo(%1)<<128+hi(%1)
>> +   vaddps   %1,   %2
>> +  %endif
>> +   vshufps  %2,   %1,   %1,   q1032
>> +   vaddps   %1,   %2
>> +   vshufps  %2,   %1,   %1,   q0321
>> +   vaddps   %1,   %2
>> +
>> +%else  ; this form is a bit faster than the short avx-like emulation.
>> +movaps  %2,   %1;[d,   c,   b,
>> a   ]
>> +shufps  %1,   %1,   q1032   ; %2=[b,   a,   d,
>> c   ]
>> +addps   %1,   %2; %1=[b+d, a+c, d+b,
>> c+a ]
>> +movaps  %2,   %1
>> +shufps  %1,   %1,   q0321   ; %2=[c+a, b+d, a+c,
>> d+b ]
>> +addps   %1,   %2; %1=[c+a+b+d, b+d+a+c, a+c+d+b,
>> d+b+c+a ]
>> +; all %1 members should be equal for as lon

Re: [FFmpeg-devel] [DISCUSSION] Motion Estimation API/Library

2017-08-01 Thread Ivan Kalvachev

On 8/1/17, Clément Bœsch  wrote:
> On Tue, Aug 01, 2017 at 09:18:33AM +, Davinder Singh wrote:
> [...]
>> > In particular, the main policy of FFmpeg is to not depend on external
>> > libraries for core features. Therefore, if your project is a separate
>> >
>>
>> Just to be clear, it won't be "external" library like OpenCV...
>>
>>
>> > library, it will definitely not be used by FFmpeg codecs like you wish
>> > in your first paragraph.
>> >
>> > If you want a fighting chance of a project that is not stillborn, I
>> > think you need to make it part of FFmpeg, and make sure important
>> >
>>
>> .. it will be part of FFmpeg like libavfilter, just a new module -
>> libmotion.
>>
>
> yeah, but not a good idea. Just make it an optional component of
> libavutil.

Why not keep the code in libavcodec?

I think that (I)DCT was also given standard API and it is still there.
Also some me_cmp_fn use (I)DCT.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-03 Thread Ivan Kalvachev

On 8/2/17, Henrik Gramner  wrote:
> On Tue, Aug 1, 2017 at 11:46 PM, Ivan Kalvachev 
> wrote:
>> On 7/31/17, Henrik Gramner  wrote:
>>> Use rN instead of rNq for numbered registers (q suffix is used for
>>> named args only due to preprocessor limitations).
>>
>> Is this documented?
>
> Not sure, but there's probably a bunch small things like this that
> aren't really well documented.
>
>>> Use the same "standard" vertical alignment rules as most existing
>>> code, e.g. instructions indented by 4 spaces and operands aligned on
>>> the first comma.
>>
>> What do you mean operands aligned on the first comma?
>> The longest operand I had is "xmm0" , counting comma and space
>> I get 6 position alignment for operands.
>> Now, with "xm7" i can lower this to 5 position. Should I do that?
>> (I don't have "xm15" ).
>>
>
> 1234_1234_1234_123
> VBROADCASTSS ym1, xm1
> BLENDVPS  m1, m2, m3
>
> is the most commonly used alignment.

I see that a lot of .asm files use different alignments.
I'll try to pick something similar that I find good looking.

I also see that different function/macro can have
different position for the first comma.
This could be quite useful, to visually separate
the macros.


>>> Unless aligning every single loop entry helps a lot I'd avoid it since
>>> it does waste a bit of icache.
>>
>> I'll redo my benchmarks, but I have placed these aligns for a reason.
>> At some point removed debug instructions started making my code
>> slower. So I've placed align to avoid random slowdowns.
>
> Ok.
>
>>> Explicitly using the carry flag as a branch condition is a bit weird.
>>> Generally using jg/jl/jge/jle tends to be clearer.
>>
>> I use it to take advantage of the so called MacroFusion.
>> It allows the merge of cmp and jxx, as long as the branch
>> checks only one flag, so all signed branches are to be avoided
>> (as stated by intel manual).
>> The newer the cpu, the more opcodes (add/sub)
>> could be merged and less limitations.
>
> Every µarch that can do macro-op fusion with add/sub can do so with
> both signed and unsigned branch conditions.
> There's actually only a single µarch that can fuse cmp with unsigned
> but not signed branch conditions and that's more than a decade old
> (Conroe), but if you want to check if (unsigned)a < (unsigned)b 'jb'
> is preferred of 'jc' (both produce the exact same opcode).

One is enough reason for me :}
Since the same thing works just as well or better on newer CPU's.

Also, using above/bellow on arithmetic operations
is a lot more confusing than carry/borrow.
You can see that I use "jc" and "jnc" after "sub".

I'll keep it as it is, unless you have solid reason to change it.

>>> Assembler-specific ifdeffery should be avoided in individual files.
>>> Something equivalent already exists in x86inc actually but I don't
>>> remember if it was merged to FFmpeg from upstream (x264) yet.
>>
>> Definitely not merged.
>>
>> I hear a lot about the improved x86inc,
>> maybe it is time for you to merge it in FFmpeg?
>
> Now that I think about it I believe that part wasn't merged because
> smartalign is broken on some older nasm versions (which is another
> reason for avoiding things like this in individual files) and FFmpeg
> doesn't enforce any specific version requirements. I guess it could be
> worked around with some version checking ifdeffery if we know which
> versions are affected.

I don't see anything relevant in here:
http://repo.or.cz/nasm.git/history/HEAD:/macros/smartalign.mac
I also didn't notice anything similar in the changelog.

I don't think this is relevant reason for not merging avx2 support. ;)

>>> Isn't this just an overly complicated way of expressing "dd 0*4, 1*4,
>>> 2*4, 3*4, 4*4, 5*4, 6*4, 7*4"?
>>
>> Yes.
>> Do you know a way that works with "times 8"?
>> I've done it this way to make it easy to change the size of the constant.
>>
>> Do you request I change it?
>
> I'd prefer just using that simple one-liner I posted. Replacing the
> numbers if you want to change things later is trivial.

It might be trivial, but it also makes it easier to misspell something.
I'd prefer to keep it as it is.

>>> Is reordering moves for the non-AVX path worth the additional
>>> complexity? Microoptimizations that only affect legacy hardware are
>>> IMO a bit questionable.
>>
&g

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-04 Thread Ivan Kalvachev

On 8/4/17, Henrik Gramner  wrote:
> On Thu, Aug 3, 2017 at 11:36 PM, Ivan Kalvachev 
> wrote:
>>> 1234_1234_1234_123
>>> VBROADCASTSS ym1, xm1
>>> BLENDVPS  m1, m2, m3
>>>
>>> is the most commonly used alignment.
>>
>> I see that a lot of .asm files use different alignments.
>> I'll try to pick something similar that I find good looking.
>>
>> I also see that different function/macro can have
>> different position for the first comma.
>> This could be quite useful, to visually separate
>> the macros.
>
> Things like indentation can be a bit inconsistent at times, yes. Cody
> by different authors written over the span of many year etc.
>
> It's normal to diverge from "standard alignment" in cases when macro
> names (or memory addresses) are fairly long. Otherwise the amount of
> whitespace between instructions and operands would easily get too
> large to be practical.

I tried different things. The one space after comma indeed looks best.
I used 2 spaces after instruction.

>>> Now that I think about it I believe that part wasn't merged because
>>> smartalign is broken on some older nasm versions (which is another
>>> reason for avoiding things like this in individual files) and FFmpeg
>>> doesn't enforce any specific version requirements. I guess it could be
>>> worked around with some version checking ifdeffery if we know which
>>> versions are affected.
>>
>> I don't see anything relevant in here:
>> http://repo.or.cz/nasm.git/history/HEAD:/macros/smartalign.mac
>> I also didn't notice anything similar in the changelog.
> http://repo.or.cz/nasm.git/commit/f29123b936b1751c6d72dad86b05eb293cca5c6c
> https://bugzilla.nasm.us/show_bug.cgi?id=3392319

This would break generation of dependencies?
I think that FFmpeg uses combined mode with nasm,
so it should not call preprocessor only mode,
unless DBG=1 is used.

I'll investigate this further.
(revert the nasm fix, try building ffmpeg dependencies ...)

>> I had it mixed before, but I changed it to be consistent.
>> Some new avx instruction are not overloaded or
>> available without their "v-" prefix.
>> e.g. x86util uses vpbroadcastw in SPLATW.
>
> Every instruction that has both a legacy encoding and a VEX-encoding
> will be automatically converted to VEX in AVX functions when written
> without the v-prefix. There is no legacy encoding for newer
> instructions so there's no reason for x86inc to overload those.

That's great, but it doesn't address the original problem.
Do you insist on removing the "v-" prefix from AVX only instructions?

You said that this is purely cosmetic and
I think that it would make compilation a bit slower,
because of unnecessary macro expansions. ;)

>> Isn't one of the selling points for x86inc that
>> it would warn if unsupported instructions are used?
>
> Not really, no. It's done in a small subset of cases as a part of the
> aforementioned auto legacy/VEX conversion because it was trivial to
> add that functionality, but correctly tracking every single x86
> instruction in existence would be a whole different thing.
>
> Some instructions requires different instruction sets not just
> depending on the vector width but the operand types as well (for
> example, pextrw from mm to gpr requires mmx2, from xmm to gpr requires
> sse2, from xmm to memory requires sse4.1) and there's no generic way
> to handle every special corner case which would make things fairly
> complex. I'm guessing doing so would also cause a significant
> reduction in compilation speed which I'm sure some people would
> dislike.

There is already check for register size and integer operations,
because SSE1 does not have 128bit xmm integer operations.
It should be just few lines of code to add the same check for
AVX1 and 256bit ymm integer operations.
(I'm not confident enough to mess with that code, yet.)

As for the immediate task:
The "error" message stays.

Maybe I should also add "use PBROADCASTD instead" to it?


>>> Add your improvements to the existing x86util macro (can be done in
>>> the same patch) and then use that.
>>
>> FFmpeg guidelines request that it is a separate patch in separate mail.
>
> Doesn't really seem to be followed historically by looking at the git
> history, but doing things by the book is never wrong so making it a
> separate commit might be the correct choice.

Will do.

>> Just renaming it is not enough, because SPLATD
>> works differently (it can broadcast second, third or forth element too)
>> and it is already

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-05 Thread Ivan Kalvachev

On 8/5/17, Rostislav Pehlivanov  wrote:
> On 4 August 2017 at 23:58, Ivan Kalvachev  wrote:
>
>>
>> >> I had it mixed before, but I changed it to be consistent.
>> >> Some new avx instruction are not overloaded or
>> >> available without their "v-" prefix.
>> >> e.g. x86util uses vpbroadcastw in SPLATW.
>> >
>> > Every instruction that has both a legacy encoding and a VEX-encoding
>> > will be automatically converted to VEX in AVX functions when written
>> > without the v-prefix. There is no legacy encoding for newer
>> > instructions so there's no reason for x86inc to overload those.
>>
>> That's great, but it doesn't address the original problem.
>> Do you insist on removing the "v-" prefix from AVX only instructions?
>>
>>
> I insist.

If you insist, then you should have a good technical reason for it.
Something that is not based on looks, cosmetics or "it has always been
like this".

For example, slow compilation because of unnecessary macro expansion is
a technical reason to keep the "v-" prefix.

;)
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-05 Thread Ivan Kalvachev

On 8/5/17, Hendrik Leppkes  wrote:
> On Sat, Aug 5, 2017 at 12:21 PM, Ivan Kalvachev 
> wrote:
>> On 8/5/17, Rostislav Pehlivanov  wrote:
>>> On 4 August 2017 at 23:58, Ivan Kalvachev  wrote:
>>>
>>>>
>>>> >> I had it mixed before, but I changed it to be consistent.
>>>> >> Some new avx instruction are not overloaded or
>>>> >> available without their "v-" prefix.
>>>> >> e.g. x86util uses vpbroadcastw in SPLATW.
>>>> >
>>>> > Every instruction that has both a legacy encoding and a VEX-encoding
>>>> > will be automatically converted to VEX in AVX functions when written
>>>> > without the v-prefix. There is no legacy encoding for newer
>>>> > instructions so there's no reason for x86inc to overload those.
>>>>
>>>> That's great, but it doesn't address the original problem.
>>>> Do you insist on removing the "v-" prefix from AVX only instructions?
>>>>
>>>>
>>> I insist.
>>
>> If you insist, then you should have a good technical reason for it.
>> Something that is not based on looks, cosmetics or "it has always been
>> like this".
>>
>> For example, slow compilation because of unnecessary macro expansion is
>> a technical reason to keep the "v-" prefix.
>>
>
> Consistency with the typical style of our codebase is also a reason.

No, it is not. This fall in category of doing things wrong,
because we have always done them wrong.
Also you brought up the matter of compilation speed,
now you don't care about it? ;)

Don't worry, I will do the requested change,
only because in future it might help with
avx1 & ymm integer test.

Please answer my other question(s).
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] Add macros used in opus_pvq_search to x86util.asm

2017-08-05 Thread Ivan Kalvachev

Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
From cf4dc8fcd974a845b91aaa8685c06fa145b01786 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Sat, 5 Aug 2017 20:18:50 +0300
Subject: [PATCH 1/6] Add macros to x86util.asm .

Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.

Signed-off-by: Ivan Kalvachev 
---
 libavutil/x86/x86util.asm | 108 ++
 1 file changed, 100 insertions(+), 8 deletions(-)

diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
index cc7d272cad..d460ee5193 100644
--- a/libavutil/x86/x86util.asm
+++ b/libavutil/x86/x86util.asm
@@ -832,14 +832,25 @@
 pmaxsd  %1, %2
 %endmacro
 
-%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32
-%if cpuflag(avx)
-vbroadcastss %1, %2
-%else ; sse
-%ifnidn %1, %2
-movss%1, %2
-%endif
-shufps   %1, %1, 0
+%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
+%if cpuflag(avx2)
+vbroadcastss  %1, %2; ymm, xmm
+%elif cpuflag(avx)
+%ifnum sizeof%2 ; avx1 register
+vpermilps  xmm%1, xmm%2, q  ; xmm, xmm, imm || ymm, ymm, imm
+%if sizeof%1 >= 32  ; mmsize>=32
+vinsertf128  %1, %1, xmm%1, 1   ; ymm, ymm, xmm, im
+%endif
+%else   ; avx1 memory
+vbroadcastss  %1, %2; ymm, mm32 || xmm, m32
+%endif
+%else
+%ifnum sizeof%2 ; sse register
+shufps  %1, %2, %2, q
+%else   ; sse memory
+movss   %1, %2
+shufps  %1, %1, 0
+%endif
 %endif
 %endmacro
 
@@ -854,6 +865,21 @@
 %endif
 %endmacro
 
+%macro VPBROADCASTD 2 ; dst xmm/ymm, src m32/xmm
+%if cpuflag(avx2)
+vpbroadcastd  %1, %2
+%elif cpuflag(avx) && sizeof%1 >= 32
+%error vpbroadcastd not possible with ymm on avx1. try vbroadcastss
+%else
+%ifnum sizeof%2 ; sse2 register
+pshufd  %1, %2, q
+%else   ; sse memory
+movd%1, %2
+pshufd  %1, %1, 0
+%endif
+%endif
+%endmacro
+
 %macro SHUFFLE_MASK_W 8
 %rep 8
 %if %1>=0x80
@@ -918,3 +944,69 @@
 movhlps%1, %2; may cause an int/float domain transition and has a dependency on dst
 %endif
 %endmacro
+
+; Horizontal Sum of Packed Single precision floats
+; The resulting sum is in all elements.
+%macro HSUMPS 2 ; dst/src, tmp
+%if cpuflag(avx)
+%if sizeof%1>=32  ; avx
+vperm2f128  %2, %1, %1, (0)*16+(1)
+addps   %1, %2
+%endif
+shufps  %2, %1, %1, q1032
+addps   %1, %2
+shufps  %2, %1, %1, q0321
+addps   %1, %2
+%else  ; this form is a bit faster than the short avx-like emulation.
+movaps  %2, %1
+shufps  %1, %1, q1032
+addps   %1, %2
+movaps  %2, %1
+shufps  %1, %1, q0321
+addps   %1, %2
+; all %1 members should be equal for as long as float a+b==b+a
+%endif
+%endmacro
+
+; Emulate blendvps if not available
+;
+; src_b is destroyed when using emulation with logical operands
+; SSE41 blendv instruction is hard coded to use xmm0 as mask
+%macro BLENDVPS 3 ; dst/src_a, src_b, mask
+%if cpuflag(avx)
+blendvps  %1, %1, %2, %3
+%elif cpuflag(sse4)
+%if notcpuflag(avx)
+%ifnidn %3,xmm0
+%error sse41 blendvps uses xmm0 as default 3d operand, you used %3
+%endif
+%endif
+blendvps  %1, %2, %3
+%else
+xorps  %2, %1
+andps  %2, %3
+xorps  %1, %2
+%endif
+%endmacro
+
+; Emulate pblendvb if not available
+;
+; src_b is destroyed when using emulation with logical operands
+; SSE41 blendv instruction is hard coded to use xmm0 as mask
+%macro PBLENDVB 3 ; dst/src_a, src_b, mask
+%if cpuflag(avx)
+%if cpuflag(avx) && notcpuflag(avx2) && sizeof%1 >= 32
+%error pblendb not possible with ymm on avx1, try blendvps.
+%endif
+pblendvb  %1, %1, %2, %3
+%elif cpuflag(sse4)
+%ifnidn %3,xmm0
+%error sse41 pblendvd uses xmm0 as default 3d operand, you used %3
+%endif
+pblendvb  %1, %2, %3
+%else
+pxor  %2, %1
+pand  %2, %3
+pxor  %1, %2
+%endif
+%endmacro
-- 
2.13.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/exr : add SSE2 simd for reorder pixels (WIP)

2017-08-05 Thread Ivan Kalvachev

On 8/5/17, Martin Vignali  wrote:
> Hello,
>
> Based on pull request in the official openexr library :
> https://github.com/openexr/openexr/pull/229/commits/4198128397c033d4f69e5cc0833195da500c31cf
>
> i try to port the reorder part (used in zip and rle decompression), to asm
> Tested on OS X
> pass fate test for me
>
> I test with this command :
> ./ffmpeg -i ./fate-suite/exr/rgba_slice_zip16.exr -f null -;./ffmpeg -i
> ./fate-suite/exr/rgba_slice_zip16.exr -f null - -cpuflags 0
>
> the result of the timer are :
> scalar : 960355 decicycles in reorder_pixels_zip,  64 runs,  0
> skips
> sse :   84763 decicycles in reorder_pixels_zip,  64 runs,  0
> skips
>
> Comments Welcome
>
> Martin
> Jokyo Images

64 runs seems way too few runs for reliable result.
Please try to run ffmpeg encoding for about a minute.

About the patch:

> +%include "libavutil/x86/x86util.asm"
> +
> +SECTION .text

Please include "x86inc.asm" explicitly.

> +INIT_XMM sse2
> +cglobal reorder_pixels, 3,5,3, src, dst, size
> +
> +shrr2,1;half_size

It's better to use the labels you define in the cglobal,
If I count correctly, "r2" would be "sizeq".
("srcq" and "dstq" would be the correct labels for the pointers).

> +;calc loop count for simd reorder
> +movr3,r2;
> +shrr3,4;calc loop count simd

Align the instruction at 4 spaces.
Align the first operands so that the ending commas are vertically aligned.
For the follow up operands, just add one space after the comma.

Put some spaces before the comment, to separate it from the operands.
BTW, this is not C, you don't need to end every line with ";"
e.g.
> +mov  r3, r2
> +shr  r3, 4  ;calc loop count simd

> +;jump to afterloop2 if loop count simd is 0
> +cmpr3,0;
> +jleafterloop2;

If you only check for size==0, then
the "shr" has already set the correct Z flag.

However, if r3/size==1, you'd still read
16 bytes at once in the first loop.
Aka, overread.

> +;simd loop
> +loop1:
> +movdqaxmm0,[r0];load first 16 bytes

Use "m0" instead.

> +lea r4, [r0 + r2];
> +
> +movdquxmm1, [r4]; unaligned load

r4 doesn't seem to be used for anything else.
Just move the address calculation directly into
the "movdqu", it can take it.

> +movdqa xmm2,xmm0;copy xmm0
> +
> +punpcklbw xmm2,   xmm1;
> +movdqa [r1],   xmm2;
> +add r1, 16;

use "mmsize" instead of 16.

> +movdqa xmm2,   xmm0;
> +punpckhbw xmm2,   xmm1;
> +movdqa[r1],   xmm2;
> +addr1,16;

You can actually avoid one of the "add"
if you use [r1+mmsize] and add 32 at the second
"add" (or 2*mmsize).

> +decr3;
> +addr0,16;

> +; test repeat
> +cmpr3,   0;
> +jgeloop1;

First, "dec" instructions are avoided,
because they do a partial update
on the flag register and
this causes interdependence.

Second, you can use the flags from
the "sub" to check if it has reached
zero or has gone negative.

> +afterloop2:
> +;scalar part
> +
> +mov r3, r2;
> +and r3, 15 ;half_size % 16
> +lea r4, [r0 + r2];
> +
> +;initial condition loop
> +cmpr3,0;
> +jleend;
> +
> +loop2:
> +mov r1, r0;
> +inc r1;
> +mov r1, r4;
> +inc r1;
> +inc r0;
> +inc r4;
> +dec r3;
> +; test repeat
> +cmpr3,   0;
> +jgloop2;

O_o
This loop does not read or write to memory.

You can replace the second "cmp" in this
loop with unconditional jump to the
"initial condition loop" compare.
(aka move loop2: two instruction above).

This is how "for(;;)" is usually implemented in C compilers.

Be sure to test your function with different sizes,
to salt your output buffers and check for underwrites, overwrites
etc...

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH 1/2]v2 Add macros used in opus_pvq_search to x86util.asm

2017-08-06 Thread Ivan Kalvachev

On 8/6/17, Henrik Gramner  wrote:
> On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev  wrote:
>> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
>> +%if cpuflag(avx2)
>> +vbroadcastss  %1, %2; ymm, xmm
>> +%elif cpuflag(avx)
>> +%ifnum sizeof%2 ; avx1 register
>> +vpermilps  xmm%1, xmm%2, q  ; xmm, xmm, imm || ymm, ymm,
>> imm
>
> Nit: Use shufps instead of vpermilps, it's one byte shorter but
> otherwise identical in this case.
>
> c5 e8 c6 ca 00vshufps xmm1,xmm2,xmm2,0x0
> c4 e3 79 04 ca 00 vpermilps xmm1,xmm2,0x0

It's also 1 latency cycle less on some old AMD cpu's.

Done.


>> +%macro BLENDVPS 3 ; dst/src_a, src_b, mask
>> +%if cpuflag(avx)
>> +blendvps  %1, %1, %2, %3
>> +%elif cpuflag(sse4)
>> +%if notcpuflag(avx)
>> +%ifnidn %3,xmm0
>> +%error sse41 blendvps uses xmm0 as default 3d operand, you
>> used %3
>> +%endif
>> +%endif
>
> notcpuflag(avx) is redundant (it's always true since AVX uses the first
> branch).

Done.

This is a remnant from the time I had label to turn on and off
different implementations.


Best Regards

 ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
From a43da9061c08dcf4cb6ecd7c8eaad074cdb551d1 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Sat, 5 Aug 2017 20:18:50 +0300
Subject: [PATCH 1/6] Add macros to x86util.asm .

Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.

Signed-off-by: Ivan Kalvachev 
---
 libavutil/x86/x86util.asm | 106 ++
 1 file changed, 98 insertions(+), 8 deletions(-)

diff --git a/libavutil/x86/x86util.asm b/libavutil/x86/x86util.asm
index cc7d272cad..e1220dfc1a 100644
--- a/libavutil/x86/x86util.asm
+++ b/libavutil/x86/x86util.asm
@@ -832,14 +832,25 @@
 pmaxsd  %1, %2
 %endmacro
 
-%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32
-%if cpuflag(avx)
-vbroadcastss %1, %2
-%else ; sse
-%ifnidn %1, %2
-movss%1, %2
-%endif
-shufps   %1, %1, 0
+%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
+%if cpuflag(avx2)
+vbroadcastss  %1, %2
+%elif cpuflag(avx)
+%ifnum sizeof%2 ; avx1 register
+shufps  xmm%1, xmm%2, xmm%2, q
+%if sizeof%1 >= 32  ; mmsize>=32
+vinsertf128  %1, %1, xmm%1, 1
+%endif
+%else   ; avx1 memory
+vbroadcastss  %1, %2
+%endif
+%else
+%ifnum sizeof%2 ; sse register
+shufps  %1, %2, %2, q
+%else   ; sse memory
+movss   %1, %2
+shufps  %1, %1, 0
+%endif
 %endif
 %endmacro
 
@@ -854,6 +865,21 @@
 %endif
 %endmacro
 
+%macro VPBROADCASTD 2 ; dst xmm/ymm, src m32/xmm
+%if cpuflag(avx2)
+vpbroadcastd  %1, %2
+%elif cpuflag(avx) && sizeof%1 >= 32
+%error vpbroadcastd not possible with ymm on avx1. try vbroadcastss
+%else
+%ifnum sizeof%2 ; sse2 register
+pshufd  %1, %2, q
+%else   ; sse memory
+movd%1, %2
+pshufd  %1, %1, 0
+%endif
+%endif
+%endmacro
+
 %macro SHUFFLE_MASK_W 8
 %rep 8
 %if %1>=0x80
@@ -918,3 +944,67 @@
 movhlps%1, %2; may cause an int/float domain transition and has a dependency on dst
 %endif
 %endmacro
+
+; Horizontal Sum of Packed Single precision floats
+; The resulting sum is in all elements.
+%macro HSUMPS 2 ; dst/src, tmp
+%if cpuflag(avx)
+%if sizeof%1>=32  ; avx
+vperm2f128  %2, %1, %1, (0)*16+(1)
+addps   %1, %2
+%endif
+shufps  %2, %1, %1, q1032
+addps   %1, %2
+shufps  %2, %1, %1, q0321
+addps   %1, %2
+%else  ; this form is a bit faster than the short avx-like emulation.
+movaps  %2, %1
+shufps  %1, %1, q1032
+addps   %1, %2
+movaps  %2, %1
+shufps  %1, %1, q0321
+addps   %1, %2
+; all %1 members should be equal for as long as float a+b==b+a
+%endif
+%endmacro
+
+; Emulate blendvps if not available
+;
+; src_b is destroyed when using emulation with logical operands
+; SSE41 blendv instruction is hard coded to use xmm0 as mask
+%macro BLENDVPS 3 ; dst/src_a, src_b, mask
+%if cpuflag(avx)
+blendvps  %1, %1, %2, %3
+%elif cpuflag(sse4)
+%ifnidn %3,xmm0
+%error sse41 blendvps uses xmm0 as default 3d operand, you used %3
+%endif
+blendvps  %1, %2, %3
+%else
+xorps  %2, %1
+andps  %2, %3
+xorps  %1, %2
+%endif
+%endmacro
+
+; Emulate pblendvb if not available
+;
+; src_b is destroyed when using emulation with logical opera

[FFmpeg-devel] [PATCH 2/2]v7 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-06 Thread Ivan Kalvachev

This patch requires "Add macros used in opus_pvq_search to x86util.asm"
as 4 of the macros are moved there.

1. Cosmetics is completely redone.

2. I've left the align code as it is.
I found a really old nasm-2.07 version (from 19 Jan 2010) and made a test build.
I got nasm-2.09.04 (from Jan 11 2011) too, just to be sure.
They all passed without issues.

The x264 x86inc.asm also uses smartalign without
checking version number.

Also I had to do a bit more extensive benchmarks,
because it's hard to tell which version is better
(with or without align).
So far it looks like the align might be faster
with 2-6 cycles at best.

So until somebody finds some concrete issue
I'd like to keep the code as it is.

(maybe try avx2 without align:)


I hope I haven't forgotten to do something.
And I do hope I haven't messed up something new.

Best Regards.
From ac21f1bbaa155090851e8b2b52d2302a0c17d6ab Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Thu, 8 Jun 2017 22:24:33 +0300
Subject: [PATCH 2/6] SIMD opus pvq_search implementation

Explanation on the workings and methods used by the
Pyramid Vector Quantization Search function
could be found in the following Work-In-Progress mail threads:
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/opus_pvq.c  |   3 +
 libavcodec/opus_pvq.h  |   5 +-
 libavcodec/x86/Makefile|   3 +
 libavcodec/x86/opus_dsp_init.c |  45 +
 libavcodec/x86/opus_pvq_search.asm | 395 +
 5 files changed, 449 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/x86/opus_dsp_init.c
 create mode 100644 libavcodec/x86/opus_pvq_search.asm

diff --git a/libavcodec/opus_pvq.c b/libavcodec/opus_pvq.c
index 2ac66a0ede..2fb276099b 100644
--- a/libavcodec/opus_pvq.c
+++ b/libavcodec/opus_pvq.c
@@ -947,6 +947,9 @@ int av_cold ff_celt_pvq_init(CeltPVQ **pvq)
 s->encode_band= pvq_encode_band;
 s->band_cost  = pvq_band_cost;
 
+if (ARCH_X86)
+ff_opus_dsp_init_x86(s);
+
 *pvq = s;
 
 return 0;
diff --git a/libavcodec/opus_pvq.h b/libavcodec/opus_pvq.h
index 6691494838..9246337360 100644
--- a/libavcodec/opus_pvq.h
+++ b/libavcodec/opus_pvq.h
@@ -33,8 +33,8 @@
float *lowband_scratch, int fill)
 
 struct CeltPVQ {
-DECLARE_ALIGNED(32, int,   qcoeff  )[176];
-DECLARE_ALIGNED(32, float, hadamard_tmp)[176];
+DECLARE_ALIGNED(32, int,   qcoeff  )[256];
+DECLARE_ALIGNED(32, float, hadamard_tmp)[256];
 
 float (*pvq_search)(float *X, int *y, int K, int N);
 
@@ -45,6 +45,7 @@ struct CeltPVQ {
 };
 
 int  ff_celt_pvq_init  (struct CeltPVQ **pvq);
+void ff_opus_dsp_init_x86(struct CeltPVQ *s);
 void ff_celt_pvq_uninit(struct CeltPVQ **pvq);
 
 #endif /* AVCODEC_OPUS_PVQ_H */
diff --git a/libavcodec/x86/Makefile b/libavcodec/x86/Makefile
index 0dbc46504e..e36644c72a 100644
--- a/libavcodec/x86/Makefile
+++ b/libavcodec/x86/Makefile
@@ -52,6 +52,8 @@ OBJS-$(CONFIG_APNG_DECODER)+= x86/pngdsp_init.o
 OBJS-$(CONFIG_CAVS_DECODER)+= x86/cavsdsp.o
 OBJS-$(CONFIG_DCA_DECODER) += x86/dcadsp_init.o x86/synth_filter_init.o
 OBJS-$(CONFIG_DNXHD_ENCODER)   += x86/dnxhdenc_init.o
+OBJS-$(CONFIG_OPUS_DECODER)+= x86/opus_dsp_init.o
+OBJS-$(CONFIG_OPUS_ENCODER)+= x86/opus_dsp_init.o
 OBJS-$(CONFIG_HEVC_DECODER)+= x86/hevcdsp_init.o
 OBJS-$(CONFIG_JPEG2000_DECODER)+= x86/jpeg2000dsp_init.o
 OBJS-$(CONFIG_MLP_DECODER) += x86/mlpdsp_init.o
@@ -123,6 +125,7 @@ X86ASM-OBJS-$(CONFIG_MDCT15)   += x86/mdct15.o
 X86ASM-OBJS-$(CONFIG_ME_CMP)   += x86/me_cmp.o
 X86ASM-OBJS-$(CONFIG_MPEGAUDIODSP) += x86/imdct36.o
 X86ASM-OBJS-$(CONFIG_MPEGVIDEOENC) += x86/mpegvideoencdsp.o
+X86ASM-OBJS-$(CONFIG_OPUS_ENCODER) += x86/opus_pvq_search.o
 X86ASM-OBJS-$(CONFIG_PIXBLOCKDSP)  += x86/pixblockdsp.o
 X86ASM-OBJS-$(CONFIG_QPELDSP)  += x86/qpeldsp.o \
   x86/fpel.o\
diff --git a/libavcodec/x86/opus_dsp_init.c b/libavcodec/x86/opus_dsp_init.c
new file mode 100644
index 00..c51f786ee8
--- /dev/null
+++ b/libavcodec/x86/opus_dsp_init.c
@@ -0,0 +1,45 @@
+/*
+ * Opus encoder assembly optimizations
+ * Copyright (C) 2017 Ivan Kalvachev 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in

Re: [FFmpeg-devel] [PATCH] JPEG2000: SSE optimisation of DWT decoding

2017-08-08 Thread Ivan Kalvachev

On 8/8/17, maxime taisant  wrote:
> From: Maxime Taisant 
>
> Hi,
>
> Here is some SSE optimisations for the dwt function used to decode JPEG2000.
> I tested this code by using the time command while reading a JPEG2000
> encoded video with ffmpeg and, on average, I observed a 4.05% general
> improvement, and a 12.67% improvement on the dwt decoding part alone.
> In the nasm code, you can notice that the SR1DFLOAT macro appear twice. One
> version is called in the nasm code by the HORSD macro and the other is
> called in the C code of the dwt function, I couldn't figure out a way to
> make only one macro.

You want to use the same macro at two locations or
you want to have 1 function and "call" it from 2 places?

For the former, I'd guess that you might have been getting
errors about duplicated labels, since you use the local to the file form
instead local to the macro form. aka: ".loop" vs "%%loop".

> I also couldn't figure out a good way to optimize the VER_SD part, so that
> is why I left it unchanged, with just a SSE-optimized version of the
> SR_1D_FLOAT function.

[...]
> +.extend:
> +shl i0d, 2
> +shl i1d, 2
> +mov j0q, i0q
> +mov j1q, i1q
> +movups m0, [lineq+j0q+4]
> +shufps m0, m0, 0x1B

The x86inc provides with readable method for the shuffle constant.
q where X is index in the source reg.
Using q3210 would generate constant that leaves all elements at their
original places.
The 0x1B is q0123 , that is swap, isn't it?.

Also, minor cosmetic nitpick.
 usually the first parameters are placed so their commas are vertically aligned.
This applies only when the parameter is register (so no jmp labels or
[] addresses ).

[...]
> +;line{2*i,2*(i+1),2*(i+2),2*(i+3)} -=
> F_LFTG_DELTA*(line{2*i-1,2*(i+1)-1,2*(i+2)-1,2*(i+3)-1}+line{2*i+1,2*(i+1)+1,2*(i+2)+1,2*(i+3)+1})
> +movups m0, [lineq+2*j0q-28]
> +movups m4, [lineq+2*j0q-12]
> +movups m1, m0
> +shufps m0, m4, 0xDD
> +shufps m1, m4, 0x88

The x86inc provides with a way to emulate 3 operand avx.
This means it hides one of the movaps (use 'a' for reg reg).
shufps m1, m0, m4, 0x88
shufps m0, m4, 0xDD

[...]
> +movups m2, [lineq+2*j0q-24]
> +movups m5, [lineq+2*j0q-8]
> +shufps m2, m5, 0xDD
> +addps m2, m1
> +mulps m2, m3
> +subps m0, m2
> +movups m4, m1
> +shufps m1, m0, 0x44 ; 0100'0100 q1010
Is that movlhps m1, m0 ?

> +shufps m1, m1, q3120
> +shufps m4, m0, 0xEE
> +shufps m4, m4, 0xD8

That's not comprehensive review, so other should still join in.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] JPEG2000: SSE optimisation of DWT decoding

2017-08-11 Thread Ivan Kalvachev

On 8/10/17, maxime taisant  wrote:
>> From: Ivan Kalvachev 
>> On 8/8/17, maxime taisant  wrote:
>> > From: Maxime Taisant 
>> >
>> > Hi,
>> >
>> > Here is some SSE optimisations for the dwt function used to decode 
>> > JPEG2000.
>> > I tested this code by using the time command while reading a JPEG2000
>> > encoded video with ffmpeg and, on average, I observed a 4.05% general
>> > improvement, and a 12.67% improvement on the dwt decoding part alone.

BTW, forgot to tell you that FFmpeg has its own benchmarking macros
that counts the cpu cycles that it takes for a execution of block of C code.
Use it (in .c files) like this

#include "libavutil/timer.h"
{
START_TIMER
function();
STOP_TIMER("function")
   }

The functions would output the results to stderr,
and they would use the name you provide to them.
(So you can benchmark more than one thing at a time).

Make sure the function(s) runs a lot per benchmark run.
The macro would show results at log2 measures.

Do more (3 or 5) separate benchmark runs, since
the final results always slightly differs. (function could
be interrupted, and if detected the measurement
would be discarded/skipped).

Also, try the macro without any function inside,
so you have the "NULL" function. The stop_timer
has an instruction fence opcode, that blocks until
all prior microcodes are executed and this could
take a while. Your benchmarks could never get
faster than the NULL function.

>> > In the nasm code, you can notice that the SR1DFLOAT macro appear
>> > twice. One version is called in the nasm code by the HORSD macro and
>> > the other is called in the C code of the dwt function, I couldn't
>> > figure out a way to make only one macro.
>>
>> You want to use the same macro at two locations or you want to have
>> 1 function and "call" it from 2 places?
>>
>> For the former, I'd guess that you might have been getting errors
>> about duplicated labels, since you use the local to the file form instead
>> local to the macro form. aka: ".loop" vs "%%loop".
>
> Currently I have one function declared with "cglobal" and called in the C
> code, and one macro with exactly the same behavior used in the nasm code.
> So I guess I would like to keep only one of the two and call it from both
> places. (Sorry if it's still not clear, English is not my native language).

Now I'm even more confused... ;)
Don't worry, I'm also not native English speaker.

>> > I also couldn't figure out a good way to optimize the VER_SD part, so
>> > that is why I left it unchanged, with just a SSE-optimized version of
>> > the SR_1D_FLOAT function.
>>
>> [...]
>> > +.extend:
>> > +shl i0d, 2
>> > +shl i1d, 2
>> > +mov j0q, i0q
>> > +mov j1q, i1q
>> > +movups m0, [lineq+j0q+4]
>> > +shufps m0, m0, 0x1B
>>
>> The x86inc provides with readable method for the shuffle constant.
>> q where X is index in the source reg.
>> Using q3210 would generate constant that leaves all elements at their
>> original places.
>> The 0x1B is q0123 , that is swap, isn't it?.
>>
>> Also, minor cosmetic nitpick.
>>  usually the first parameters are placed so their commas are vertically
>> aligned.
>> This applies only when the parameter is register (so no jmp labels or []
>> addresses ).
>>
>
> Ok, I will change all that.
>
>> [...]
>> > +;line{2*i,2*(i+1),2*(i+2),2*(i+3)} -=
>> > F_LFTG_DELTA*(line{2*i-1,2*(i+1)-1,2*(i+2)-1,2*(i+3)-
>> 1}+line{2*i+1,2*(
>> > i+1)+1,2*(i+2)+1,2*(i+3)+1})
>> > +movups m0, [lineq+2*j0q-28]
>> > +movups m4, [lineq+2*j0q-12]
>> > +movups m1, m0
>> > +shufps m0, m4, 0xDD
>> > +shufps m1, m4, 0x88
>>
>> The x86inc provides with a way to emulate 3 operand avx.
>> This means it hides one of the movaps (use 'a' for reg reg).
>> shufps m1, m0, m4, 0x88
>> shufps m0, m4, 0xDD
>
> I know, but I figured that I would do a sse version first and add avx
> support afterwards.


On 8/11/17, maxime taisant  wrote:
>
>> From: Maxime Taisant 
>>
>> > From: Ivan Kalvachev 
>> >
>> > On 8/8/17, maxime taisant  wrote:
>> > > From: Maxime Taisant 
>> > >
>> > > +movups m2, [lineq+2*j0q-24]
>> > > +movups m5, [lineq+2*j0q-8]
>> > > +shufps m2, m5, 0xDD
>> > > +addps m2, m1
>> > > +mulps m2, m3
&g

[FFmpeg-devel] [PATCH] opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling

2017-08-19 Thread Ivan Kalvachev

Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.

The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.

---
I have to point few things out.

commit f386dd70acdc81d42d6bcb885d2889634cdf45b7
"opus_pvq_search: only use rsqrtps approximation on CPUs with avx"
was rushed hack job:

1. It broke compilation with yasm.
2. Its subject says the opposite of what the patch does.
3. The changes were numerous and intrusive, obfuscating code,
while the same result could have been achieved
with a single new line.

I did request revert of this commit on irc and in ffmpeg-cvs-log mail,
I did request a patch to be sent to ffmpeg-dev-eng, to discuss the
best way to make the change.

I did oppose the follow up commit 3c99523a2864af729a8576c3fffe81fb884fa0d5,
that tried to fix the compilation, by redoing the intrusive changes once again,
and making new changes that affect the C code too. (Thus making revert
a lot more messy.)

I find it extremely offensive that I as author of the code and FFmpeg developer
was totally ignored, without providing any technical reasons and with
something that sums up to "I like my code more".

This breaks good faith, COC and the written rules.

I'm not going to request punishment, reverts, public lynching...
I don't want drama.

I'm sending this patch to make the code as
I think it should have been done from the start and
I do ask you to push it without further bikeshedding.

That would be enough of an apology.
From 18324e5535dd0c928a3ec2e8f25babc583b031d5 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Sat, 19 Aug 2017 14:29:40 +0300
Subject: [PATCH] opus_pvq_search: Restore the proper use of conditional define
 and simplify the function name suffix handling.

Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.

The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/x86/opus_pvq_search.asm | 37 ++---
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/libavcodec/x86/opus_pvq_search.asm b/libavcodec/x86/opus_pvq_search.asm
index 8cf040465d..5c1e6d6174 100644
--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -82,7 +82,7 @@ SECTION .text
 %endif
 %endmacro
 
-%macro PULSES_SEARCH 2 ; %1 - add or sub, %2 - use approximation
+%macro PULSES_SEARCH 1
 ; m6 Syy_norm
 ; m7 Sxy_norm
 addps  m6, mm_const_float_0_5   ; Syy_norm += 1.0/2
@@ -96,17 +96,17 @@ align 16
 movaps m4, [tmpY + r4]  ; y[i]
 movaps m5, [tmpX + r4]  ; X[i]
 
-%if %2
+  %if USE_APPROXIMATION == 1
 xorps  m0, m0
 cmpps  m0, m0, m5, 4; m0 = (X[i] != 0.0)
-%endif
+  %endif
 
 addps  m4, m6   ; m4 = Syy_new = y[i] + Syy_norm
 addps  m5, m7   ; m5 = Sxy_new = X[i] + Sxy_norm
 
-%if %2
+  %if USE_APPROXIMATION == 1
 andps  m5, m0   ; if(X[i] == 0) Sxy_new = 0; Prevent aproximation error from setting pulses in array padding.
-%endif
+  %endif
 
 %else
 movaps m5, [tmpY + r4]  ; m5 = y[i]
@@ -119,7 +119,7 @@ align 16
 andps  m5, m0   ; (0 0
 %%add_pulses_loop:
 
-PULSES_SEARCH add, %1   ; m6 Syy_norm ; m7 Sxy_norm
+PULSES_SEARCH add   ; m6 Syy_norm ; m7 Sxy_norm
 
 subKd, 1
 jnz  %%add_pulses_loop
@@ -325,7 +320,7 @@ align 16; K - pulses > 0
 align 16
 %%remove_pulses_loop:
 
-PULSES_SEARCH sub, %1   ; m6 Syy_norm ; m7 Sxy_norm
+PULSES_SEARCH sub   ; m6 Syy_norm ; m7 Sxy_norm
 
 addKd, 1
 jnz  %%remove_pulses_loop
@@ -376,11 +371,15 @@ align 16
 ; On Skylake & Ryzen the division is much faster (around 11c/3),
 ; that makes the full precision code about 2% slower.
 ; Opus also does use rsqrt approximation in their intrinsics code.
+%define USE_APPROXIMATION   1
+
 INIT_XMM sse2
-PVQ_FAST_SEARCH 1
+PVQ_FAST_SEARCH _approx
 
 INIT_XMM sse4
-PVQ_FAST_SEARCH 1
+PVQ_FAST_SEARCH _approx
+
+%define USE_APPROXIMATION   0
 
 INIT_XMM avx
-PVQ_FAST_SEARCH 0
+PVQ_FAST_SEARCH _exact
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] opus_pvq_search: Restore the proper use of conditional define and simplify the function name suffix handling

2017-08-19 Thread Ivan Kalvachev

The issue has been resolved.
The patch has been pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]lavc/put_bits: Remove usage of BITSTREAM_WRITER_LE.

2017-08-22 Thread Ivan Kalvachev

On 8/22/17, Ronald S. Bultje  wrote:
> Hi,
>
> On Mon, Aug 21, 2017 at 11:16 AM, Carl Eugen Hoyos 
> wrote:
>
>> Hi!
>>
>> Attached patch tries to slightly simplify and clean up the usage of
>> put_bits* in libavcodec: put_bits_le() functions exist for the
>> little-endian G.726 encoder, so the define makes less sense now.
>>
>> Fate passes here, please review, Carl Eugen
>
>
> I have to agree with Paul here, I can't say I'm a big fan of this...

People,
As developers you are required
not only to give ultimate final verdicts,
but also give (nice technical) reasoning for them.


Here is my list of pro and cons:

- If it ain't broken, don't change it.
+ Bytesteam already uses explicit _le/be and it looks good.

+ Makes the code more clear. It is obvious that the given encoder
writes BE stream. Something that could easily be missed with the
single define.
- The type of bitstream however is not really important for the codec
working. Aka, there is no confusing, because no codec writes BE and LE
at the same time.(yet)

+ Removes messy defines that obfuscate put_bits code, by separating
the big and little ending code.
- Duplicates put_bits.h code. It would probably make reworking harder,
as changes have to be synced in 2 places.

While the last negative point is most important to me,
there is a workaround for it.
The new bitstream functions could be created by a template.

This however would make the code use more messy defines,
that also negates the last positive point.


Well, maybe other developers have better points.
Let's see them.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] all: avoid data imports across DLL boundaries

2017-08-24 Thread Ivan Kalvachev

On 8/24/17, James Almer  wrote:
> On 8/24/2017 12:01 PM, wm4 wrote:
>> On Thu, 24 Aug 2017 11:20:17 +0200
>> Michael Niedermayer  wrote:
>>
>>> On Thu, Aug 24, 2017 at 10:52:55AM +0200, wm4 wrote:
 On Wed, 23 Aug 2017 19:23:12 -0300
 James Almer  wrote:

> On 8/21/2017 2:51 PM, wm4 wrote:
>> From: Pedro Pombeiro 
>>
>> DLL data imports cause problems on Windows. They normally require
>> each variable to be correctly marked with dllimport/dllexport
>> declspecs. For MSVC, we define av_export to dllimport declspec. This
>> is not entirely correct - depending on which sub-lib is built, they
>> should be marked to dllexport instead. It happens to work with MSVC,
>> because it supports exports incorrectly marked as imports. Trying to
>> use this breaks on MinGW and results in linker errors.
>>
>> On MinGW, not using any import/export specifiers happens to work,
>> because binutils and the MinGW runtime provide "pseudo relocations",
>> which manually fix these imports at load time. This does not work with
>> MinGW WinRT builds: the relocations cannot be performed because this
>> would require writing to the code section, which is not allowed.
>>
>> Get rid of all these issues by not using data exports/imports. The
>> public API is already free of them, but avpriv_ symbols make extensive
>> use of them. Replace them all with getters.
>
> Should be good i think, but it can't be applied as is until the next
> major bump as it breaks ABI (Tons of avpriv_ symbols are removed).

 Well, I call bullshit. We've never taken ABI compatibility between
 FFmpeg libs seriously, especially not if it was about avpriv functions.

>>>
>>> We did take ABI compatibility between FFmpeg libs serious.
>>> And we must to be shiped by distributions that package the libs based
>>> on our ABI versions.
>>>
>>>

 And guess what? You didn't either. Commit 7c9d2ad45f4e46ad2c3b2 is
 authored and committed by you, and changes an avpriv function in an
 incompatible way, without even minor version bumps.
>>>
>>> This commit did not break ABI
>>> no release contains the removed symbol:
>>>
>>>  git grep avpriv_dca_parse_core_frame_header release/3.3
>>>  git grep avpriv_dca_parse_core_frame_header release/3.2
>>>  git grep avpriv_dca_parse_core_frame_header release/3.1
>>>  git grep avpriv_dca_parse_core_frame_header release/3.0
>>>  git grep avpriv_dca_parse_core_frame_header release/2.4
>>>  git grep avpriv_dca_parse_core_frame_header release/2.8
>>>
>>> The symbol was added 9 days before it was removed, it was in no
>>> release
>>>
>>> commit 7c9d2ad45f4e46ad2c3b2e93051efbe1e0d0529e
>>> Author: James Almer 
>>> Date:   Wed Jul 19 01:53:22 2017 -0300
>>>
>>> avcodec/dca: remove GetBitContext usage from
>>> avpriv_dca_parse_core_frame_header()
>>>
>>> commit 2123ddb4251bf39bde8b38a1307a0f6154d260e6
>>> Author: foo86 
>>> Date:   Mon Jul 10 17:11:33 2017 +0300
>>>
>>> avcodec: add avpriv_dca_parse_core_frame_header()
>>> [...]
>>
>> We keep ABI stability even within git master, not just within releases.
>> Otherwise, our lives would be so much easier whenever ABI problems come
>> up.
>
> Yes, that's why the fix was committed two or three days after the symbol
> was introduced, and not weeks after (Ignore the above dates, those are
> authoring dates).
>
> It was in any case a change that could have waited until the major bump.
> I hurried to replace it because i thought it was becoming the only
> GetBitContext usage in a libavcodec exported function, thus effectively
> making it part of the ABI, something that plays against any attempt to
> replace it with the new bitstream reader.
> Turns out that no, there were other avpriv_ symbols alredy using it, so
> a major bump is nonetheless required to fix GetBitContext's ABI
> dependency in other existing avpriv_ symbols.

Digging the past won't help with the current issues.

Can we keep the avpriv_* export of constants, (aka not remove them)
but still apply the portion where avpriv_get_*() are used?

This should keep the ABI and should not cause problems
unless libraries from different builds are mixed.

Probably a minor version bump for the getters
and major for removing the exports.

Since the symbols to constants would no longer be used
by the ffmpeg libraries, the mingw linking problem should
be solved.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] all: avoid data imports across DLL boundaries

2017-08-24 Thread Ivan Kalvachev

On 8/25/17, James Almer  wrote:
> On 8/24/2017 8:26 PM, Ivan Kalvachev wrote:
>> On 8/24/17, James Almer  wrote:
>>> On 8/24/2017 12:01 PM, wm4 wrote:
>>>> On Thu, 24 Aug 2017 11:20:17 +0200
>>>> Michael Niedermayer  wrote:
>>>>
>>>>> On Thu, Aug 24, 2017 at 10:52:55AM +0200, wm4 wrote:
>>>>>> On Wed, 23 Aug 2017 19:23:12 -0300
>>>>>> James Almer  wrote:
>>>>>>
>>>>>>> On 8/21/2017 2:51 PM, wm4 wrote:
>>>>>>>> From: Pedro Pombeiro 
>>>>>>>>
>>>>>>>> DLL data imports cause problems on Windows. They normally require
>>>>>>>> each variable to be correctly marked with dllimport/dllexport
>>>>>>>> declspecs. For MSVC, we define av_export to dllimport declspec. This
>>>>>>>> is not entirely correct - depending on which sub-lib is built, they
>>>>>>>> should be marked to dllexport instead. It happens to work with MSVC,
>>>>>>>> because it supports exports incorrectly marked as imports. Trying to
>>>>>>>> use this breaks on MinGW and results in linker errors.
>>>>>>>>
>>>>>>>> On MinGW, not using any import/export specifiers happens to work,
>>>>>>>> because binutils and the MinGW runtime provide "pseudo relocations",
>>>>>>>> which manually fix these imports at load time. This does not work
>>>>>>>> with
>>>>>>>> MinGW WinRT builds: the relocations cannot be performed because this
>>>>>>>> would require writing to the code section, which is not allowed.
>>>>>>>>
>>>>>>>> Get rid of all these issues by not using data exports/imports. The
>>>>>>>> public API is already free of them, but avpriv_ symbols make
>>>>>>>> extensive
>>>>>>>> use of them. Replace them all with getters.
>>>>>>>
>>>>>>> Should be good i think, but it can't be applied as is until the next
>>>>>>> major bump as it breaks ABI (Tons of avpriv_ symbols are removed).
>>>>>>
>>>>>> Well, I call bullshit. We've never taken ABI compatibility between
>>>>>> FFmpeg libs seriously, especially not if it was about avpriv
>>>>>> functions.
>>>>>>
>>>>>
>>>>> We did take ABI compatibility between FFmpeg libs serious.
>>>>> And we must to be shiped by distributions that package the libs based
>>>>> on our ABI versions.
>>>>>
>>>>>
>>>>>>
>>>>>> And guess what? You didn't either. Commit 7c9d2ad45f4e46ad2c3b2 is
>>>>>> authored and committed by you, and changes an avpriv function in an
>>>>>> incompatible way, without even minor version bumps.
>>>>>
>>>>> This commit did not break ABI
>>>>> no release contains the removed symbol:
>>>>>
>>>>>  git grep avpriv_dca_parse_core_frame_header release/3.3
>>>>>  git grep avpriv_dca_parse_core_frame_header release/3.2
>>>>>  git grep avpriv_dca_parse_core_frame_header release/3.1
>>>>>  git grep avpriv_dca_parse_core_frame_header release/3.0
>>>>>  git grep avpriv_dca_parse_core_frame_header release/2.4
>>>>>  git grep avpriv_dca_parse_core_frame_header release/2.8
>>>>>
>>>>> The symbol was added 9 days before it was removed, it was in no
>>>>> release
>>>>>
>>>>> commit 7c9d2ad45f4e46ad2c3b2e93051efbe1e0d0529e
>>>>> Author: James Almer 
>>>>> Date:   Wed Jul 19 01:53:22 2017 -0300
>>>>>
>>>>> avcodec/dca: remove GetBitContext usage from
>>>>> avpriv_dca_parse_core_frame_header()
>>>>>
>>>>> commit 2123ddb4251bf39bde8b38a1307a0f6154d260e6
>>>>> Author: foo86 
>>>>> Date:   Mon Jul 10 17:11:33 2017 +0300
>>>>>
>>>>> avcodec: add avpriv_dca_parse_core_frame_header()
>>>>> [...]
>>>>
>>>> We keep ABI stability even within git master, not just within releases.
>>>> Otherwise, our lives would be so much easier whenever ABI problems come
>>>> up.
>>>
>>> Yes, that's why the fix

Re: [FFmpeg-devel] (no subject)

2017-08-25 Thread Ivan Kalvachev

On 8/22/17, Rodrigo Severo  wrote:
> Hi,
>
>
> My company does some video recording and internal streaming.
>
> Our current solution (on Ubuntu 16.10 servers) uses ffmpeg and
> ffserver (versions 3.0.7). It works great.
>
> Unfortunately, on Ubuntu 17.04, it stopped working. I believe the
> problems I'm facing are related to the deprecation of ffserver. On
> Ubuntu 17.04 the ffmpeg/ffserver available is version 3.2.4.
>
> I would like to know if there is any developer(s) interested in
> assuming a paid job to maintain ffserver making it compatible with
> present and future ffmpeg versions (most important part) and
> eventually implementing new features.
>
> Are any of you interested?
>
> Please contact me privately if interested so we can further detail
> what should be done.
>
>
> Regards,
>
> Rodrigo Severo

Would you please resend your offer with properly filled email subject.
Something like "offer paid job to maintain ffserver".

It's quite easy to miss email with "no subject", especially with all
the spam around.

I also would like ffserver to be maintained.
Unfortunately streaming, format handling etc
are not my area of expertise.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH] opus_pvq_search.asm: Handle zero vector input differently.

2017-08-25 Thread Ivan Kalvachev

Instead of returning all zeroes as result and Syy=1.0,
place all the K pulses in the first element y[0]
and return Syy=K*K.

This is how the original opus function handles the case.
This is how the existing pvq_search_c handles the case.

Also, according to Rostislav, the encoded all zeros vector
would be decoded as such y[0]=K vector, before dequantization.
So it is better to do that explicitly and calculate
the proper gain in the encoder.
--

I must point out that ppp_pvq_search_c() does generate
y[0]=-K vector, not +K.
This is because FFSIGN(0.0) returns -1.
I do consider this bug, however I'm not quite sure what
is the best way to handle it.

1. Fix localy
#undef FFSIGN
#define FFSIGN(a) ((a) >= 0 ? 1 : -1)

2. Use different name for that macro
#define OPUS_SIGN(a) ...

3. Fix by special case in ppp_pvq_search_c():
if( !(res > 0.0) ) { y[0]=K; for(i=1;iFrom 33d29a33dcf5ad8c4850edf9ed8d83292f10b03f Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Fri, 25 Aug 2017 17:14:28 +0300
Subject: [PATCH] opus_pvq_search.asm: Handle zero vector input differently.

Instead of returning all zeroes as result and Syy=1.0,
place all the K pulses in the first element y[0]
and return Syy=K*K.

This is how the original opus function handles the case.
This is how the existing pvq_search_c handles the case.

Also, according to Rostislav, the encoded all zeros vector
would be decoded as such y[0]=K vector, before dequantization.
So it is better to do that explicitly and calculate
the proper gain in the encoder.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/x86/opus_pvq_search.asm | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/libavcodec/x86/opus_pvq_search.asm b/libavcodec/x86/opus_pvq_search.asm
index 5c1e6d6174..adf3e6f87c 100644
--- a/libavcodec/x86/opus_pvq_search.asm
+++ b/libavcodec/x86/opus_pvq_search.asm
@@ -252,9 +252,9 @@ align 16
 
 xorps  m0, m0
 comissxm0, xm1  ;
+cvtsi2ss  xm0, dword Kd ; m0 = K
 jz   %%zero_input   ; if (Sx==0) goto zero_input
 
-cvtsi2ss  xm0, dword Kd ; m0 = K
 %if USE_APPROXIMATION == 1
 rcpss xm1, xm1  ; m1 = approx(1/Sx)
 mulss xm0, xm1  ; m0 = K*(1/Sx)
@@ -355,7 +355,8 @@ align 16
 RET
 
 align 16
-%%zero_input:
+%%zero_input:   ; expected m0 = K
+mulss xm6, xm0, xm0 ; Syy_norm = K*K
 lea   r4d, [Nd - mmsize]
 xorps  m0, m0
 %%zero_loop:
@@ -363,7 +364,7 @@ align 16
 sub   r4d, mmsize
 jnc  %%zero_loop
 
-movaps m6, [const_float_1]
+mov   [outYq], Kd
 jmp  %%return
 %endmacro
 
-- 
2.14.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] opus_pvq_search.asm: Handle zero vector input differently.

2017-08-25 Thread Ivan Kalvachev

On 8/25/17, Rostislav Pehlivanov  wrote:
> On 25 August 2017 at 16:38, Ivan Kalvachev  wrote:
>
>> Instead of returning all zeroes as result and Syy=1.0,
>> place all the K pulses in the first element y[0]
>> and return Syy=K*K.
>>
>> This is how the original opus function handles the case.
>> This is how the existing pvq_search_c handles the case.
>>
>> Also, according to Rostislav, the encoded all zeros vector
>> would be decoded as such y[0]=K vector, before dequantization.
>> So it is better to do that explicitly and calculate
>> the proper gain in the encoder.
>> --
>>
>> I must point out that ppp_pvq_search_c() does generate
>> y[0]=-K vector, not +K.
>> This is because FFSIGN(0.0) returns -1.
>> I do consider this bug, however I'm not quite sure what
>> is the best way to handle it.
>>
>>
> I don't think its a bug,

It is most definitely a bug.

e.g. Float point has positive and negative zeros,
and this FFSIGN() turns both into negatives.

>celt_cwrsi() also has the same behavior (and
> doesn't use FFSIGN) so FF_SIGN returning -1 in the encoder matches what
> celt_cwrsi() would see.

I suspect that this might be celt special case value,
so K + (-K) would decode to vector that is
all zeroes, including y[0].
Could you check that.

The bitstream encoding scheme is really hard to figure out,
with all these self referencing pre-calculated tables.

Is there a description how this stuff works?

> Could you send a V2 which outputs a -K in y[0] instead?

I would really prefer if you commit this patch as it is.
It handles the special case in the same way as celt opus
does, so it is more likely to be correct.

To get the bug-for-bug behavior of ppp_pvq_search_c()
just add
  "negKd"
right before the
  "mov   [outYq], Kd"
(around line 367).
and commit it as your own change.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] opus_pvq_search.asm: Handle zero vector input differently.

2017-08-25 Thread Ivan Kalvachev

On 8/25/17, Rostislav Pehlivanov  wrote:
> On 25 August 2017 at 16:38, Ivan Kalvachev  wrote:
>
>> Instead of returning all zeroes as result and Syy=1.0,
>> place all the K pulses in the first element y[0]
>> and return Syy=K*K.
>>
>> This is how the original opus function handles the case.
>> This is how the existing pvq_search_c handles the case.
>>
>> Also, according to Rostislav, the encoded all zeros vector
>> would be decoded as such y[0]=K vector, before dequantization.
>> So it is better to do that explicitly and calculate
>> the proper gain in the encoder.
>> --
>>
>> I must point out that ppp_pvq_search_c() does generate
>> y[0]=-K vector, not +K.
>> This is because FFSIGN(0.0) returns -1.
>> I do consider this bug, however I'm not quite sure what
>> is the best way to handle it.
>>
>>
> I don't think its a bug,

It is most definitely a bug.

e.g. Float point has positive and negative zeros,
and this FFSIGN() turns both into negatives.

>celt_cwrsi() also has the same behavior (and
> doesn't use FFSIGN) so FF_SIGN returning -1 in the encoder matches what
> celt_cwrsi() would see.

I suspect that this might be celt special case value,
so K + (-K) would decode to vector that is
all zeroes, including y[0].
Could you check that.

The bitstream encoding scheme is really hard to figure out,
with all these self referencing pre-calculated tables.

Is there a description how this stuff works?

> Could you send a V2 which outputs a -K in y[0] instead?

I would really prefer if you commit this patch as it is.
It handles the special case in the same way as celt opus
does, so it is more likely to be correct.

To get the bug-for-bug behavior of ppp_pvq_search_c()
just add
  "negKd"
right before the
  "mov   [outYq], Kd"
(around line 367).
and commit it as your own change.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]lavc/put_bits: Remove usage of BITSTREAM_WRITER_LE.

2017-08-25 Thread Ivan Kalvachev

On 8/24/17, Carl Eugen Hoyos  wrote:
> 2017-08-23 0:56 GMT+02:00 Ivan Kalvachev :
>> On 8/22/17, Ronald S. Bultje  wrote:
>>> Hi,
>>>
>>> On Mon, Aug 21, 2017 at 11:16 AM, Carl Eugen Hoyos 
>>> wrote:
>>>
>>>> Hi!
>>>>
>>>> Attached patch tries to slightly simplify and clean up the usage of
>>>> put_bits* in libavcodec: put_bits_le() functions exist for the
>>>> little-endian G.726 encoder, so the define makes less sense now.
>>>>
>>>> Fate passes here, please review, Carl Eugen
>>>
>>> I have to agree with Paul here, I can't say I'm a big fan of this...
>>
>> People,
>> As developers you are required
>> not only to give ultimate final verdicts,
>> but also give (nice technical) reasoning for them.
>>
>> Here is my list of pro and cons:
>>
>> - If it ain't broken, don't change it.
>
> No objection here - on the contrary, I wish this argument
> would count here more often...

Well, this too could be taken to an extreme.

>> + Bytesteam already uses explicit _le/be and it looks good.
>>
>> + Makes the code more clear. It is obvious that the given encoder
>> writes BE stream. Something that could easily be missed with the
>> single define.
>
>> - The type of bitstream however is not really important for the codec
>> working. Aka, there is no confusing, because no codec writes BE and LE
>> at the same time.(yet)
>
> Not at the same time but in the same encoder file (G.726).

It's at the same time...
since the endian-ness is decided at runtime.

There are two codecs g726le and g726, they execute same
code, with just few different conditions.

>> + Removes messy defines that obfuscate put_bits code, by separating
>> the big and little ending code.
>
>> - Duplicates put_bits.h code. It would probably make reworking harder,
>> as changes have to be synced in 2 places.
>
> I don't think this is correct (and not what the diffstats show afair) but
> it doesn't matter: I was under the impression this patch was a
> requirement after a previous commit, I am not particularly keen
> on it, see above.

Oh, there is already put_bits_le() that is always present
and put_bits() that could be le/be depending on the define.

Yeh, that needs a cleanup.

However it seems that the second g726 codecs is added by you
and that the introduction of put_bits_le duplication/unwinding
is also done by you, isn't it?

Maybe there is still more elegant way to do all that,
though I can't think of one now.

Best Regards.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] libavcodec/exr : add SSE SIMD for reorder_pixels v2 (WIP)

2017-08-29 Thread Ivan Kalvachev

On 8/26/17, Martin Vignali  wrote:
> Hello,
>
> in attach new patch for SSE simd of reorder pixels in exr decoder (use by
> zip and rle uncompress),
> after comments on the previous patch by Ivan Kalvachev.
>
> After testing only on a small buffer, i fix the overread problem of the
> previous patch (who hid the last loop trouble)
>
> pass fate test for me (on Mac Os X)
>
>
> Tested with the decoding of a sequence of 150 HD Exr images (CGI render
> with 17 layers per file in float pixel, ZIP16 compression)
>
> SSE :
> 349190 decicycles in reorder_pixels_zip,  130716 runs,356 skips
> bench: utime=109.222s
> bench: maxrss=607002624kB
>
> Scalar :
> 3039686 decicycles in reorder_pixels_zip,  130395 runs,677 skips
> bench: utime=123.042s
> bench: maxrss=607019008kB

That's impressive. :)

> Comments Welcome

Since you asked ;)
---
> [...]
> +;**
> +
> +%include "libavutil/x86/x86util.asm"

Still missing explicit x86inc.asm

> +SECTION .text
> +
> +;--
> +; void ff_reorder_pixels_sse2(uint8_t *src, uint8_t *dst, int size)
> +;--
> +
> +
> +INIT_XMM sse2
> +cglobal reorder_pixels, 3,6,3, src, dst, size

You do use r5, but not r4. Change r5 to r4 and decrease the usage above.

> +
> +shrsizeq, 1;   sizeq = half_size
> +mov   r3, sizeq
> +shr   r3, 4;   r3 = half_size/16 -> loop_simd count
> +
> +loop_simd:
> +;initial condition loop
> +jle  after_loop_simd;  jump to scalar part if loop_simd 
> count(r3) is 0
> +
> +movdqam0, [srcq];   load first part
> +movdqum1, [srcq + sizeq];   load second part

Would you test if moving the movdqu first makes any difference in speed?
I had a similar case and I think that makes it faster,
since movdqu has bigger latency.
Might not matter on newer cpu.

(If you can't tell the difference, leave it as it is.)

> +movdqam2, m0;   copy m0
> +
> +punpcklbw m2, m1;   interleaved part 1

Both instructions could be combined as
 punpcklbw m2, m0, m1
And the movdqa would be inserted by the x86inc macro.

> +movdqa[dstq], m2;   copy to dst array
> +
> +punpckhbw m0, m1;   interleaved part 2
> +movdqa [dstq+mmsize], m0;   copy to dst array
> +add dstq, 2*mmsize; inc dst
> +add srcq, mmsize;   inc src
> +sub   r3, 1
> +jmploop_simd

I think there is a way to remove some of these arithmetics.
x86 allows a bit more complicated addressing in the form of
[const+r1+r2*x] where x could be power of two -> 1, 2, 4, 8.

If you use r3 as address offset (aka r3+=mmsize)
you can do:
mov  [dsrq+r3*2+mmsize], m3

This however would mean that you'd need
src2 as separate pointer/register, because
[src1q+sizeq+r3] uses 3 different registers.

You'll also need 2 instruction "add" and "cmp"
at the end of the loop.

Well, there are (at least) 2 ways to combine the add+cmp.

1. If you start with r3=aligned(sizeq)-mmsize ,
and you make descending loop with single "sub r3, mmsize",
until you go negative.

2. A trick used in x264.
You start with r3=-1*aligned(sizeq) and you use "add r3,mmsize"
until you reach zero.
Since r3 is used as offset, you'd have to correct all the pointers too
(aka, add aligned sizeq to src1, src2 and double to dst).

In my tests, on my machine #2 was always slower
(and that's without second scalar loop,
that has to deal with "corrected" pointers).

I also found that accessing memory in descending order
is no worse than ascending, so I've been using
descending loops unless I needed increasing
positive index for some other reason.

I do not request that you use any of these methods,
but you might want to test and see
if any of them is faster for you.

> +after_loop_simd:
> +;scalar part
> +mov   r3, sizeq;r3 = half_size
> +and   r3, 15;   r3 = half_size % 16

mmsize-1 , instead of 15.

> +loop_scalar:
> +;initial condition loop
> +cmp   r3, 0
> +jleend
> +
> +mov  r5b, [srcq];   load byte first part
> +mov   [dstq], r5b;  copy first byte to dst
> +
> +mov  r5b, [srcq + sizeq];   load byte s

[FFmpeg-devel] [RFC][PATCH] Allow include files to be installed into subdirectory

2014-08-10 Thread Ivan Kalvachev

The patch is inspired by something I read in the Debian discussion.
Libav and FFmpeg could be installed side by side without conflicts in
the libraries, thanks to using additional suffixes.

However development/include files are still conflicting, so I thought
of a simple configure hack to give more control to FFmpeg.

With this patch you can do `./configure --extra-incdir="/ffmpeg" ` ,
as result the header files would be installed in
"/usr/local/include/ffmpeg"
e.g.
"/usr/local/include/ffmpeg/libavcodec/avcodec.h"
...

The change is reflected in the pkgconfig files too. Since applications
that link to libraries with addition suffixes should be using
pkgconfig, they would be using the correct header includes
automatically.


I did just a rudimentary test and I couldn't spot anything wrong.
Please test it before inclusion.


As for the future, it might be good idea next major release (e.g. 3.0)
to have the $prefix/include/ffmpeg as default include path (for
non-comapt build).


Best Regards
   Ivan Kalvachev
  iive
From 5760ef5c29c02f5e137d4af3199eb71371b02ce1 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Mon, 4 Aug 2014 21:40:48 +0300
Subject: [PATCH] Implement --extra-incdir=SUFFIX

Signed-off-by: Ivan Kalvachev 
---
 configure | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 0ac6132..8c29f52 100755
--- a/configure
+++ b/configure
@@ -302,6 +302,7 @@ Toolchain options:
   --extra-ldflags=ELDFLAGS add ELDFLAGS to LDFLAGS [$LDFLAGS]
   --extra-ldexeflags=ELDFLAGS add ELDFLAGS to LDEXEFLAGS [$LDEXEFLAGS]
   --extra-libs=ELIBS   add ELIBS [$ELIBS]
+  --extra-incdir=SUFFIXadd suffix to include directory []
   --extra-version=STRING   version string suffix []
   --optflags=OPTFLAGS  override optimization-related compiler flags
   --build-suffix=SUFFIXlibrary name suffix []
@@ -1901,6 +1902,7 @@ CMDLINE_SET="
 cxx
 dep_cc
 doxygen
+extra_incdir
 extra_version
 gas
 host_cc
@@ -5434,7 +5436,7 @@ FFMPEG_CONFIGURATION=$FFMPEG_CONFIGURATION
 prefix=$prefix
 LIBDIR=\$(DESTDIR)$libdir
 SHLIBDIR=\$(DESTDIR)$shlibdir
-INCDIR=\$(DESTDIR)$incdir
+INCDIR=\$(DESTDIR)$incdir${extra_incdir}
 BINDIR=\$(DESTDIR)$bindir
 DATADIR=\$(DESTDIR)$datadir
 DOCDIR=\$(DESTDIR)$docdir
@@ -5653,7 +5655,7 @@ pkgconfig_generate(){
 prefix=$prefix
 exec_prefix=\${prefix}
 libdir=$libdir
-includedir=$incdir
+includedir=$incdir$extra_incdir
 
 Name: $name
 Description: $comment
-- 
1.8.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC][PATCH] Allow include files to be installed into subdirectory

2014-08-11 Thread Ivan Kalvachev

On 8/11/14, Andreas Cadhalpun  wrote:
> Hi Ivan,
>
> On 11.08.2014 01:23, Ivan Kalvachev wrote:
>> The patch is inspired by something I read in the Debian discussion.
>> Libav and FFmpeg could be installed side by side without conflicts in
>> the libraries, thanks to using additional suffixes.
>
> Thanks for your interest in the matter.
>
>> However development/include files are still conflicting, so I thought
>> of a simple configure hack to give more control to FFmpeg.
>>
>> With this patch you can do `./configure --extra-incdir="/ffmpeg" 
>> --prefix="/usr"` ,
>> as result the header files would be installed in
>> "/usr/include/ffmpeg"
>> e.g.
>> "/usr/include/ffmpeg/libavcodec/avcodec.h"
>> ...
>
> Assuming it would be possible to install development packages for both
> at the same time, which one should be used, when building a program?

It would be decided by the gcc -I inclusion option.
It modifies the search paths for header files, so that it checks these
paths before the system/default ones (When using `#include
"libavcodec/avcodec.h" `)

Libav headers are probably going to remain in the default /usr/include/ .
Usually this should not be a problem, unless a program uses `#include
 ` that should be used for system headers (aka
checks the system/default paths first).

There might still be problems if there are explicit -I/usr/include
inserted by other libraries that mix the inclusion order.

To avoid hard to debug compiling problems, that the -dev packages may
still be marked as conflicting, even though no files are overwritten.

This final conflict however cannot be resolved on FFmpeg side.

>> The change is reflected in the pkgconfig files too. Since applications
>> that link to libraries with addition suffixes should be using
>> pkgconfig, they would be using the correct header includes
>> automatically.
>
> If that's to be determined via pkg-config files, then the obvious
> problem is that both install these files in the same location, and if
> one doesn't, programs can't find the library.

The problem is not that they are installed in the same location, but
that they have same names.

However I just checked and when you define library suffix (e.g.
--build-suffix="_ffmpeg"), it also adds the same suffix to the .pc
files. So you have
/usr/lib/share/pkgconfig/libavcodec_ffmpeg.pc
/usr/lib/libavcodec_ffmpeg.so

In short, assuming if Libav doesn't change their library names and
pkgconfig files,
the programs that are looking for 'libavcodec.pc'  would get the Libav one,
the programs that are looking for 'libavcodec_ffmpeg.pc' would get FFmpeg.

I believe that you should already be using --build-suffix, so that
should already be in the package.

Is there something else I'm missing?

>> I did just a rudimentary test and I couldn't spot anything wrong.
>> Please test it before inclusion.
>
> The patch seems to be alright, but it doesn't make it possible to make
> the development packages of FFmpeg and Libav co-installable.

Unfortunately not.
But without the patch the problem is entirely in how FFmpeg installs its header.
With it, it is entirely Libav problem for using system/common
directory. Then, if Debian wants to resolve the conflict, Libav header
files could be moved in their own directory. The patch should be
trivial to port.

>> As for the future, it might be good idea next major release (e.g. 3.0)
>> to have the $prefix/include/ffmpeg as default include path (for
>> non-comapt build).
>
> So this would also not help.

At the moment FFmpeg is trying to mimic 100% Libav API/ABI, and to be
perfect drop-in replacement. This means using the same library names,
locations and header files.
This is what I call compat build. It is 100% compatible with Libav
from application/program point of view.

When using --build-suffix, you obviously want to differentiate from
Libav, so I call that non-compat build.

Sorry if this is caused confusion.

I hope that the way Debian packages FFmpeg would actually turn out to
be the new default.

Best Regards
   Ivan Kalvachev
  iive
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC][PATCH] Allow include files to be installed into subdirectory

2014-08-13 Thread Ivan Kalvachev

On 8/12/14, Andreas Cadhalpun  wrote:
> Hi,
>
> On 12.08.2014 02:21, Ivan Kalvachev wrote:
>> On 8/11/14, Andreas Cadhalpun  wrote:
>>> Assuming it would be possible to install development packages for both
>>> at the same time, which one should be used, when building a program?
>>
>> It would be decided by the gcc -I inclusion option.
>
> But this path has to come from the pkg-config files.

Yes, pkg-config or programs specificly checking for that path.

>> It modifies the search paths for header files, so that it checks these
>> paths before the system/default ones (When using `#include
>> "libavcodec/avcodec.h" `)
>>
>> Libav headers are probably going to remain in the default /usr/include/ .
>> Usually this should not be a problem, unless a program uses `#include
>>  ` that should be used for system headers (aka
>> checks the system/default paths first).
>
> But a lot of programs use '#include '.

That's problem.
Fortunately we don't have to solve this one right now.

>> There might still be problems if there are explicit -I/usr/include
>> inserted by other libraries that mix the inclusion order.
>>
>> To avoid hard to debug compiling problems, that the -dev packages may
>> still be marked as conflicting, even though no files are overwritten.
>>
>> This final conflict however cannot be resolved on FFmpeg side.
>
> Yes, they still have to conflict.

IMHO, we should do what we can on our side.

>>>> The change is reflected in the pkgconfig files too. Since applications
>>>> that link to libraries with addition suffixes should be using
>>>> pkgconfig, they would be using the correct header includes
>>>> automatically.
>>>
>>> If that's to be determined via pkg-config files, then the obvious
>>> problem is that both install these files in the same location, and if
>>> one doesn't, programs can't find the library.
>>
>> The problem is not that they are installed in the same location, but
>> that they have same names.
>>
>> However I just checked and when you define library suffix (e.g.
>> --build-suffix="_ffmpeg"), it also adds the same suffix to the .pc
>> files. So you have
>> /usr/lib/share/pkgconfig/libavcodec_ffmpeg.pc
>> /usr/lib/libavcodec_ffmpeg.so
>
> Yes, but I patched that out in the Debian package [1], because otherwise
> the programs wouldn't find the pkg-config files.

Oh, this is problem.
IMHO, having the same names is wrong.

Isn't it simpler to have the libavcodec_ffmpeg.pc and a symlink
libavcodec.pc pointing to it?

The idea here is the future.

If program prefers to build with ffmpeg
it should be able to explicitly specify that it wants the ffmpeg
(libavcodec-ffmpeg.pc).

If program doesn't care,
then it uses the generic symlink (libavcodec.pc).

Is there some long debian documentation about symlinks, alernatives,
etc, that you want to avoid? ;)

>> In short, assuming if Libav doesn't change their library names and
>> pkgconfig files,
>> the programs that are looking for 'libavcodec.pc'  would get the Libav
>> one,
>> the programs that are looking for 'libavcodec_ffmpeg.pc' would get
>> FFmpeg.
>
> The problem is that programs only look for libavcodec.pc.

Since Libav breaks API/ABI regularly, all applications require manual
maintenance.
They could change if there is correct alternative.

>> I believe that you should already be using --build-suffix, so that
>> should already be in the package.
>>
>> Is there something else I'm missing?
>
> Probably, that the pkg-config files in the FFmpeg package for Debian are
> still libavcodec.pc etc..

Covered above.

>>>> I did just a rudimentary test and I couldn't spot anything wrong.
>>>> Please test it before inclusion.
>>>
>>> The patch seems to be alright, but it doesn't make it possible to make
>>> the development packages of FFmpeg and Libav co-installable.
>>
>> Unfortunately not.
>> But without the patch the problem is entirely in how FFmpeg installs its
>> header.
>> With it, it is entirely Libav problem for using system/common
>> directory. Then, if Debian wants to resolve the conflict, Libav header
>> files could be moved in their own directory. The patch should be
>> trivial to port.
>
> I still don't see what the benefit would be, as the *-dev packages still
> have to conflict.

It is the future. If both projects separated headers, libraries and
pkgconfigs, there would be no conflicts.
At the moment we can only change FFmpeg to do the right thi

Re: [FFmpeg-devel] [RFC][PATCH] Allow include files to be installed into subdirectory

2014-08-13 Thread Ivan Kalvachev

On 8/13/14, Reimar Döffinger  wrote:
> On 12.08.2014, at 02:21, Ivan Kalvachev  wrote:
>> It would be decided by the gcc -I inclusion option.
>> It modifies the search paths for header files, so that it checks these
>> paths before the system/default ones (When using `#include
>> "libavcodec/avcodec.h" `)
>>
>> Libav headers are probably going to remain in the default /usr/include/ .
>> Usually this should not be a problem, unless a program uses `#include
>>  ` that should be used for system headers (aka
>> checks the system/default paths first).
>
> Huh? The only difference between "" and <> is that "" also searches relative
> to the file's directory (and that first).
> -I applies equally to all.

My bad, I must have combined wrongly two separate peaces of information.

Reading this http://gcc.gnu.org/onlinedocs/cpp/Search-Path.html also
says I'm wrong about the -I/usr/include . Since it is a system
directory the option would be removed/ignored, to ensure the system
headers are included last.

So basically there is no problem at all with the header files.
We could have non-conflicting -dev packages.

> Anyone adding -I/usr/include explicitly deserves some beating.

I think that FFmpeg/Libav pkgconfig files do this, if installed in
--prefix=/usr .
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Reintroducing FFmpeg to Debian

2014-08-16 Thread Ivan Kalvachev

On 8/16/14, Pau Garcia i Quiles  wrote:
> On Sat, Aug 16, 2014 at 5:30 PM, Nicolas George  wrote:
>
>
>> The only option is to make sure the users do not suffer from the fork, by
>> making sure they can easily use the version that is most suited for their
>> need without being sucked into the developers' disagreements.
>>
>
> Can we get back on topic?
>
> With or without libav in Debian, there are solid technical reasons to have
> ffmpeg in Debian. We have both GraphicsMagick and ImageMagick (although
> they parted ways in a civilized way: different library names), and we
> certainly have a ton of librarys which provide very similar features.
>
> Since before the fork, the libav developers have been sabotaging ffmpeg as
> much as possible, in every "combat field": library names, library versions,
> taking distributions hostage (ffmpeg package that installs libav!?), etc.
> This is not the way to fork anything. This is a fact. I don't care whether
> Michael Nidermayer was a dictator or not. I don't care whether the
> code-review rules in libav are better or worse. I don't care what the Linux
> kernel does. The only thing I care about is Debian is shipping a
> less-capable (i. e. less multimedia formats supported) distribution due to
> this conflict.
>
> This has to stop.
>
> ffmpeg is not yet in Debian due to the filename clashing, which will most
> certainly cause binary problems.
>
> If libav and ffmpeg maintainers cannot reach an agreement regarding library
> names and it's not possible to simply use either ffmpeg or libav
> indistinctly due missing features binary compatibility, etc, the obvious
> solution is that BOTH libav and ffmpeg rename their libraries in Debian. E.
> g. libavcodec-ffmpeg.so and libavcodec-libav.so, etc. Maybe even use
> alternatives to provide the binaries (ffmpeg, ffplay, etc). It's been done
> in the past.

AFAIK, Andreas' package uses libavcodec-ffmpeg.so .

FFmpeg configure does have option --build-suffix="_ffmpeg" that would
append that suffix to library names and pkg-config files. Since
applications might have problem finding the ffmpeg libraries, the
pkg-config files should be with the old "common" names and this
creates a conflict in the -dev packages.

Libav and FFmpeg can coexist side by side.
There are no conflicts or overlap for binary users.

The current goal of FFmpeg is not replacing Libav.
The current goal is establishing a native presence in the most popular
distribution(s).

I'm quite sure the Security team is full of capable people who can
handle one more package.
FFmpeg takes security seriously.

The best scenario for everybody is:
1. Libav stays and all QA tested programs are not touched.
2. FFmpeg is included in a way that does not obstruct the rest of the ecosystem.
3. Optionally, programs that use _only_ FFmpeg could be included back
in Debian. Optionally.

The inclusion would allow for a real-life estimate to be done of the
FFmpeg performance, security, bug and feature wise.

Only after assessing real-life data, a final decision could be
reached, if there is still demand for such thing.

Best Regards
   Ivan Kalvachev
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC] removing libmpcodecs for reuniting purposes.

2014-08-17 Thread Ivan Kalvachev

On 8/18/14, compn  wrote:
> libav brought up the point again that it doesnt like libmpcodecs.
>
> is there anyone using the remaining libmpcodecs filters ?
> most of the filters have been ported to libavfilter already.

They doesn't like Michael too.
They doesn't like that we merge their code.
They doesn't like that we use FFmpeg name and domain.
They doesn't like that we have more features.
They doesn't like that we have less bugs.

I don't think we should try to please them.

Anyway, the filters remains seems to be:

eq,eq2 - image equalizers, changes gamma, contrast, brightness, saturation

ilpack - interlaced yuv420-> yuv422 converter. Scale should be able to
do that too.

softpulldown - turns soft into hard telecine.

fspp,pp7,uspp - spp variants, faster, simpler and ultra
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC] removing libmpcodecs for reuniting purposes.

2014-08-18 Thread Ivan Kalvachev

On 8/18/14, Carl Eugen Hoyos  wrote:
> Ivan Kalvachev  gmail.com> writes:
>
>> softpulldown - turns soft into hard telecine.
>
> Do you remember how this filter was used with MEncoder?
> I have a suspicion that it cannot work with FFmpeg.

I have written about it before:

On 5/4/13, Ivan Kalvachev  wrote:
> On 4/29/13, Paul B Mahol  wrote:
>> The filter funcionality is same as interlace/tinterlace.
>> Required flag: MP_IMGFIELD_REPEAT_FIRST is never set,
>> and can not be set as its not exported from lavf/lavc.
>
> That flag is derivided from AVFrame->repeat_pict .
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC] removing libmpcodecs for reuniting purposes.

2014-08-18 Thread Ivan Kalvachev

On 8/18/14, Kieran Kunhya  wrote:
> On 18 August 2014 02:26, Ivan Kalvachev  wrote:
>
>> ilpack - interlaced yuv420-> yuv422 converter. Scale should be able to
>> do that too.
>
> Scale doesn't have much (any?) knowledge of interlaced chroma.
> That said I could probably port this filter to lavfi because it would
> be quite useful to me. I wasn't actually aware it existed.

Hum?
"
‘interl’
Set the interlacing mode. It accepts the following values:
‘1’Force interlaced aware scaling.
‘0’Do not apply interlaced scaling.
‘-1’Select interlaced aware scaling depending on whether
the source frames are flagged as interlaced or not.
Default value is ‘0’.
"

Maybe the default value should be -1.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] Reintroducing FFmpeg to Debian

2014-08-19 Thread Ivan Kalvachev

On 8/18/14, Moritz Mühlenhoff  wrote:
> Andreas Cadhalpun  schrieb:
>> Hi Thomas,
>>
>> On 18.08.2014 08:36, Thomas Goirand wrote:
>>> There's been a very well commented technical reason stated here: the
>>> release team don't want to deal with 2 of the same library that are
>>> doing (nearly) the same things, with potentially the same security
>>> issues that we'd have to fix twice rather than once.
>>
>> Why is it a security problem to have FFmpeg and Libav, but apparently no
>> problem to have MySQL, MariaDB and PerconaDB?
>
> Raphael Geissert already wrote that mysql/mariadb/percona will be
> addressed as well; we haven't come around to since since we need to
> deal with a lot of stuf and being dragged into endless discussions
> on -devel is certainly not helpful.
>
> Cheers,
> Moritz

Excuse my interruption, but I intend to be a little blunt.

I think there might be a little bit of miscommunication.

You have said that security team cannot handle both FFmpeg and Libav.
Since Libav is already in Debian, this statement is assumed to mean
that you do not want to deal with FFmpeg. However this mail
http://lists.debian.org/debian-devel/2014/08/msg00060.html kind of
hints the opposite - Libav security handling is horrible and burden to
you, while FFmpeg so far is responsive and responsible.

So I would like to get a little bit more details on your priorities
and preferences. The options I could think of are:
1. Drop both Libav and FFmpeg.
2. Leave Libav in stable, keep FFmpeg out.
3. Get FFmpeg in stable, drop Libav.
4. Get both Libav and FFmpeg, under the condition that Michael is
helping with FFmpeg patching.
5. Get both Libav and FFmpeg, under the condition that Michael is
helping with FFmpeg AND Libav patching (only for jessie).
6. Something else...

Other people have said that FFmpeg should provide help and resources
to the security team. Please elaborate what more can FFmpeg do to
please you.

Best Regards
   Ivan Kalvachev
  iive

P.S.
I hope ftp masters are not deliberately prolonging the FFmpeg
inclusion, thinking they are doing favor to their peers from other
teams.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

[FFmpeg-devel] [PATCH][RFC] Improve and fix put_vc2_ue_uint() function.

2018-02-28 Thread Ivan Kalvachev

Replace two bit handling loops and internal conditional branch
with simple formula using few logical operations.

The old function would generate wrong output
if the input does not fit into 15 bits.

Fix this by using 64 bit math and put_bits64().
This case should be quite rare, since the bug
has not asserted itself.

---
It's attempt for speed optimization, but in the
process it turned out it needs also bugfixing.

I only tested the old case of the code,
to confirm i've implemented the correct function.

Haven't done any benchmarks or run fate.

It should be faster, especially because currently coefficients bellow
2048 are written using lookup table and bypass this function.

If you like it, use it.

Best Regards
   Ivan Kalvachev.
From 1f7fd38fcb6c64281bc458c09c711fc567b3ef0f Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev 
Date: Wed, 28 Feb 2018 17:48:40 +0200
Subject: [PATCH] Improve and fix put_vc2_ue_uint() function.

Replace two bit handling loops and internal conditional branch
with simple formula using few logical operations.

The old function would generate wrong output
if the input does not fit into 15 bits.

Fix this by using 64 bit math and put_bits64().
This case should be quite rare, since the bug
has not asserted itself.

Signed-off-by: Ivan Kalvachev 
---
 libavcodec/vc2enc.c | 31 ++-
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/libavcodec/vc2enc.c b/libavcodec/vc2enc.c
index b7adcd3d36..b2f1611ea3 100644
--- a/libavcodec/vc2enc.c
+++ b/libavcodec/vc2enc.c
@@ -187,28 +187,33 @@ typedef struct VC2EncContext {
 
 static av_always_inline void put_vc2_ue_uint(PutBitContext *pb, uint32_t val)
 {
-int i;
-int pbits = 0, bits = 0, topbit = 1, maxval = 1;
+int bits = 0;
+uint64_t pbits = 0;
 
 if (!val++) {
 put_bits(pb, 1, 1);
 return;
 }
 
-while (val > maxval) {
-topbit <<= 1;
-maxval <<= 1;
-maxval |=  1;
-}
+bits = ff_log2(val);
 
-bits = ff_log2(topbit);
+if (bits > 15) {
+pbits = val;
 
-for (i = 0; i < bits; i++) {
-topbit >>= 1;
-pbits <<= 2;
-if (val & topbit)
-pbits |= 0x1;
+pbits = ((pbits<<16)|pbits)&0xULL;
+pbits = ((pbits<< 8)|pbits)&0x00FF00FF00FF00FFULL;
+pbits = ((pbits<< 4)|pbits)&0x0F0F0F0F0F0F0F0FULL;
+pbits = ((pbits<< 2)|pbits)&0xULL;
+pbits = ((pbits<< 1)|pbits)&0xULL;
+
+put_bits64(pb, bits*2 + 1, (pbits << 1) | 1);
+return;
 }
+ // ' ' ponm'lkji hgfe'dcba
+val = ( (val << 8) | val ) & 0x00FF00FF; // ' ponm'lkji ' hgfe'dcba
+val = ( (val << 4) | val ) & 0x0F0F0F0F; // 'ponm 'lkji 'hgfe 'dcba
+val = ( (val << 2) | val ) & 0x; // __po'__nm __lk'__ji __hg'__fe __dc'__ba
+val = ( (val << 1) | val ) & 0x; // _p_o'_n_m _l_k'_j_i _h_g'_f_e _d_c'_b_a
 
 put_bits(pb, bits*2 + 1, (pbits << 1) | 1);
 }
-- 
2.16.2

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH][RFC] Improve and fix put_vc2_ue_uint() function.

2018-02-28 Thread Ivan Kalvachev

On 3/1/18, Michael Niedermayer  wrote:
> On Wed, Feb 28, 2018 at 10:14:15PM +0200, Ivan Kalvachev wrote:
>> Replace two bit handling loops and internal conditional branch
>> with simple formula using few logical operations.
>>
>> The old function would generate wrong output
>> if the input does not fit into 15 bits.
>>
>> Fix this by using 64 bit math and put_bits64().
>> This case should be quite rare, since the bug
>> has not asserted itself.
>>
>> ---
>> It's attempt for speed optimization, but in the
>> process it turned out it needs also bugfixing.
>>
>> I only tested the old case of the code,
>> to confirm i've implemented the correct function.
>>
>> Haven't done any benchmarks or run fate.
>
> fate fails:
>

Well, the good news is that
it compiles and doesn't crash.

The change of generated files is expected.
The old code would write up to 63 bits
with regular put_bits() that is limited to 31 bits.

Are av_assert2() enabled during fate runs?
(That's how put_bits() checks for bit length.)

I'd like to know if my code is correct/incorrect
before attempting to changing fate checksums.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]Set the default for --shlibdir to --libdir

2014-12-12 Thread Ivan Kalvachev

On 12/12/14, Carl Eugen Hoyos  wrote:
> Hi!
>
> Attached patch fixes ticket #4183.
>
> Please review, Carl Eugen

You should also update the configure help text, as the default is not
PREFIX/lib anymore.

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH]Set the default for --shlibdir to --libdir

2014-12-14 Thread Ivan Kalvachev

On 12/13/14, Carl Eugen Hoyos  wrote:
> Clément Bœsch  pkh.me> writes:
>
>> > > > Attached patch fixes ticket #4183.
>
>> >--libdir=DIR install libs in DIR [PREFIX/lib]
>> > -  --shlibdir=DIR   install shared libs in DIR [PREFIX/lib]
>> > +  --shlibdir=DIR   install shared libs in DIR [LIBDIR]
>
>> What if LIBDIR is not defined? (is it possible?)

It is supposed to be set by the previous option (--libdir).
Just like there is no PREFIX env var, but it is the value set by the
--prefix option.

Probably "[same as --libdir]" would be easier to understand, but it
would not be consistent with the rest of how [] is used in configure
help. And if you go that way, how would you describe the --libdir
default?

"[ --prefix/lib]"
"[ ${--prefix}/lib]"
"[ ${prefix}/lib]"
"[$prefix/lib]"
"[$PREFIX/lib]"
"[PREFIX/lib]"

Hum, actually, I kind of like the 3'd  one.

> I believe you misunderstand (there are no shell
> variables involved afaict).
>
> I am not claiming that I am sure this patch is a
> good idea (I am against behaviour changes) but
> this was reported by a user and I believe the
> patch does what the user wants so a decision will
> have to be made: Either close as wont-fix or
> apply this (or a similar) patch.

autoconf handles a single --libdir and people do expect that other
configures work in a similar way.

Please, commit your current patch.

BTW, another possible solution is to remove --shlibdir entirely. Is
that option even used/useful?

Best Regards
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] avcodec/xvmc: apply attribute_deprecated correctly

2015-10-09 Thread Ivan Kalvachev

On 10/10/15, Ganesh Ajjanagadde  wrote:
> This fixes a warning observed on Clang 3.7:
> "warning: attribute 'deprecated' is ignored, place it after "struct" to
> apply attribute to type declaration [-Wignored-attributes]"
> and thus enables deprecation warning for the relevant struct.
>
> Signed-off-by: Ganesh Ajjanagadde 
> ---
>  libavcodec/xvmc.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/libavcodec/xvmc.h b/libavcodec/xvmc.h
> index c2e187c..465ee78 100644
> --- a/libavcodec/xvmc.h
> +++ b/libavcodec/xvmc.h
> @@ -43,7 +43,7 @@
>  #define AV_XVMC_ID0x1DC711C0  /**< special value to
> ensure that regular pixel routines haven't corrupted the struct
> the number is 1337
> speak for the letters IDCT MCo (motion compensation) */
>
> -attribute_deprecated struct xvmc_pix_fmt {
> +struct attribute_deprecated xvmc_pix_fmt {
>  /** The field contains the special constant value AV_XVMC_ID.
>  It is used as a test that the application correctly uses the API,
>  and that there is no corruption caused by pixel routines.

That struct should not be deprecated at all in FFmpeg.

Should I send a patch?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] swscale: set sane alpha blending defaults

2015-08-30 Thread Ivan Kalvachev

On 8/30/15, wm4  wrote:
> On Sun, 30 Aug 2015 18:09:05 + (UTC)
> Carl Eugen Hoyos  wrote:
>
>> wm4  googlemail.com> writes:
>>
>> > Change the default to blend with black, which
>> > gives generally expected results.
>>
>> Given that this introduces a speed regression, is
>> rarely needed and it is immediately visible that
>> the function is needed, I don't think this is a
>> sensible change.
>>
>> Note that contrary to visual issue, the performance
>> regression is not immediately visible.
>>
>> And "black" may be sensible for prores, in all
>> other cases, it is either suboptimal or bad.
>>
>> Carl Eugen
>
> PS: there is a 3rd way: make libswscale fail initialization if an alpha
> conversion happens, but the user didn't specify the alpha mode.

That or blending with checkerboard pattern.

One of the strongest points in Carl objections is that there is no
universal optimal and correct behavior.

- In one case you would want to ignore the alpha channel, as it might
contain only garbage.
- In second case, you'd want to blend it with white color, because the
flv is encoded with it in mind.
- In this case, the proress encoding assumes blending with black color.

Since there is no single optimal solution, we should force the user in
picking the correct mode himself.

I think the default should be blending with checkerboard pattern.

It is immediately recognized as alpha artifact, so the user would know
what the reason for the issue. Even better - print a hint about
'alphablend' and its modes to console.

Of course, ideally there should be some metadata gimmick to set
blend-to-white with .flv and blend-to-black for proress.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] FFmpeg 2.8

2015-08-30 Thread Ivan Kalvachev

On 8/29/15, wm4  wrote:
> On Sat, 29 Aug 2015 04:28:41 +0200
> Michael Niedermayer  wrote:
>
>> Hi all
>>
>> Its about 2 and a half month since 2.7
>> if there are no objections then ill branch of release/2.8 from
>> some revission prior to the next API bump and will then make a 2.8
>> release from that, after testing
>
> The release should be after the bump.

I think it is good idea API breaking changes to be done on major release.
Meaning the release containing the API bump should be 3.0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] H264 cleanup performance effects

2015-03-22 Thread Ivan Kalvachev

On 3/22/15, wm4  wrote:
> On Sun, 22 Mar 2015 15:12:04 +0100
> Michael Niedermayer  wrote:
>
>> Hi anton, everyone else
>
> I appreciate that you contacted the author of these patches as well.
>
>> the "cosmetic" commits from yesterday
>> (665e0c10a63d31dd6c7b1fba14db074625d54614..fa7c08d5e192aea77fdfb7f52c44c196a3ba4452)
>> /
>> (d8a45d2d49f54fde042b195f9d5859251252493d..c28ed1d743443e783537d279ae721be3bbdf7646)
>>
>> cause a ~1% speed loss in the H.264 decoder which is probably the
>> most important decoder in the codebase
>>
>> after the patchset:
>> -threads 1 -benchmark -i cathedral-beta2-400extra-crop-avc.mp4 -an -f null
>> -
>> utime=5.448s
>> utime=5.424s
>> utime=5.436s
>> utime=5.428s
>> utime=5.448s
>>
>>
>> before the patchset:
>> -threads 1 -benchmark -i cathedral-beta2-400extra-crop-avc.mp4 -an -f null
>> -
>> utime=5.360s
>> utime=5.328s
>> utime=5.356s
>> utime=5.376s
>> utime=5.360s
>>
>> Testing has been done with a single thread as the results with
>> multiple threads where inconsistent/unstable
>>
>> The speedloss is reproduceable with ffmpeg as well as libav
>
> At this point I'd argue that 1% speedloss in exchange for much cleaner
> and saner code is not the world's end. Especially because ffmpeg is
> probably already the fastest software decoder on x86, and cleaner code
> may in fact help improving the decoder further. (But then I don't know
> the h264 decoder, and may be talking out of my ass.)

No, It won't help improve the decoder further. This code would never
be touched by anybody who can't navigate in it already.

Libav is the project that values pretty code.
I hope FFmpeg still values speed and durability.

Also, the speed loss is more like 2% and that's A LOT.

If anybody wants this cleanup included, then I recommend him to find a
clean way to compensate for this slowdown.

Good Luck.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] lavf: add zip protocol

2015-03-28 Thread Ivan Kalvachev

On 3/28/15, Peter Ross  wrote:
> On Sat, Mar 28, 2015 at 10:24:55PM +0100, wm4 wrote:
>> On Sun, 29 Mar 2015 08:10:29 +1100
>> Peter Ross  wrote:
>>
>> > On Sat, Mar 28, 2015 at 08:38:40PM +0100, Lukasz Marek wrote:
>> > > On 28.03.2015 20:13, Nicolas George wrote:
>> > > >L'octidi 8 germinal, an CCXXIII, Lukasz Marek a écrit :
>> > > >>I will try to use this libarchive first and do some tests. Your
>> > > >> approach may
>> > > >>collapse in case compression libraries doesn't support parallel
>> > > >>compression/decompression (I mean that you write or read several
>> > > >> files from
>> > > >>single archive file) I would be much surprised if at least writing
>> > > >> will not
>> > > >>work.
>> > > >
>> > > >This is a likely issue, but fortunately it would not prevent all use
>> > > > cases.
>> > > >
>> > > >>I wonder if there is other solution: zip could be protocol as it is
>> > > >> now, it
>> > > >>allows to benefit from list API and gives flexibility other demuxers
>> > > >> to
>> > > >>benefit from it. There could also be a "directory" demuxer which
>> > > >> would also
>> > > >>use that API and possibly could serve streams in your way. That
>> > > >> demuxer
>> > > >>could also handle directories over any protocol that supports that
>> > > >> API.
>> > > >
>> > > >That was the kind of idea that I had. But I believe that to get that
>> > > > working
>> > > >a bit reliably, we will need to extend the directory listing
>> > > > callbacks to
>> > > >allow a URL context to create new URL contexts, to open remote files
>> > > > without
>> > > >establishing a new connection (and it will also be necessary for
>> > > > network
>> > > >servers). Some kind of VFS API, then.
>> >
>> > Agree.
>> >
>> > > This can be even harder as opening archive file require stat or other
>> > > smart
>> > > way to check some candidates that ought to be a archive file. See
>> > > below.
>> >
>> > > >>>ffplay zip:///tmp/outer.zip/tmp/inner.zip/tmp/data.bin
>> > > >>libzip can't handle it (the same way it cannot handle files via
>> > > >> protocols),
>> > > >>maybe libarchive will be better
>> > > >
>> > > >I think you misunderstood the question. I was not asking whether it
>> > > > would be
>> > > >able to decode nested files, but how your code did split nested
>> > > > paths: would
>> > > >it try to find /tmp/inner.zip/tmp/data.bin inside /tmp/outer.zip, or
>> > > >/tmp/data.bin inside /tmp/outer.zip/tmp/inner.zip (assuming someone
>> > > > was
>> > > >stupid enough to name a directory dot-zip)?
>> > >
>> > > I assumed it is local file (no other option so far). So I stat full
>> > > path
>> > > (/tmp/outer.zip/tmp/inner.zip/tmp/data.bin) for being a file, if so
>> > > then I
>> > > opened it as zip file and used fallback to open first file.
>> > > If not then I stat by path components: /tmp/, /tmp/outer.zip, and so
>> > > on...
>> > > /tmp/outer.zip is a file so I open it and treat rest of the uri as a
>> > > path
>> > > inside zip.
>> >
>> > walking the path means that the archive protocol must know about the
>> > syntax of the underlying protocol (file, http, ftp, etc.). that seems
>> > messy.
>> > also inefficient if you have got to walk a long ftp path.
>> >
>> > wouldn't we be better off defining a special character that seperates
>> > the zip
>> > path from the inner path. obviously we'd need some way of escaping the
>> > character
>> > if it is legitimately part of either path.
>> >
>> > ffplay /tmp/amovie.zip
>> > ffplay http://subtitles.org/amovie.zip#amovie.srt
>> >
>> > the syntax should support nested archives (even if the libzip/archive
>> > cannot handle
>> > that yet). e.g.
>> >
>> > ffplay /tmp/amovie.rar#/tmp/amovie.zip#amovie.srt
>> >
>> > -- Peter
>> > (A907 E02F A6E5 0CD2 34CD 20D2 6760 79C5 AC40 DD6B)
>>
>> No. '#' is perfectly allowed in URLs and local filenames.
>
> of course it is, hence my qualification above:
>>> obviously we'd need some way of escaping the character
>
> so if '##' reduces to '#', then:
> ffplay /tmp/amovie##1.zip#amovie##1.srt  would open 'amovie#1.srt' inside
> '/tmp/amovie#1.zip'
>
> '#' was only given as an example.
> pick a character (or character sequence) that is easy to type, but
> infrequently used in
> uris/filenames, such to avoid having to escape to often.

How about:
  ffplay /tmp/amovie1.zip//amovie.srt

Aka, using double directory separator to indicate diving into an archive.
This way "ffplay /tmp/amovie1.zip/amovie" would imply amovie1.zip is directory.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [RFC][PATCH][Type 2] Revert "doc/developer.texi: Add a code of conduct"

2018-06-11 Thread Ivan Kalvachev

On 6/10/18, Josh de Kock  wrote:
> On 2018/05/14 17:50, Derek Buitenhuis wrote:
>> It was never enforced, and there is no documented way to enforce it,
>> rendering it useless.
>>
>> [...]
>
> I think this is the best thing to do first. We could always re-add a
> more 'proper' CoC later, but for now there's no point creating more
> confusion. I stated this on IRC but saying this on the ML so it is
> documented more publicly.
>
> Also, it has been more than a month since this was posted with no
> resolution, when do you intend to take, or expect others to take, actual
> action with regards to this?

If I remember correctly, the CoC has been voted, so it would need
another vote to remove it.

I do not see any immediate problem that compels messing with CoC and
the politics associated with it.

The less people are involved in politics, the more work on code could be done.

Keep Calm and Carry On.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

97 matches

Mail list logo