Re: [FFmpeg-devel] [PATCH 0/2] RFC: Generic hwaccel for cuvid v2

2017-06-25 Thread Mark Thompson
On 25/06/17 00:40, Philip Langdale wrote:
> Second try.
> 
> Feedback on first proposal was lack of interest in having default behaviour
> vary between hwaccels, despite their other differences.
> 
> In this patch series, I instead force the user to confront the change in
> command line semantics when doing cuvid/nvenc transcoding. They will only
> get full transcoding with an additional command line argument, so force
> them to specify that argument.
> 
> Philip Langdale (2):
>   ffmpeg: Switch cuvid to generic hwaccel
>   ffmpeg: Require output format when using cuvid hwaccel
> 
>  Makefile   |  1 -
>  ffmpeg.h   |  2 +-
>  ffmpeg_cuvid.c | 73 
> --
>  ffmpeg_opt.c   | 31 ++---
>  4 files changed, 24 insertions(+), 83 deletions(-)
>  delete mode 100644 ffmpeg_cuvid.c

Can we take a step back here and consider what we actually want the result to 
be?


I think what we want to achieve is:

1)  CUDA devices passed to CUVID decoders as AVCodecContext.hw_device_ctx:
  a)  If we have exactly one device, always pass it.
  b)  If we have multiple devices, allow -hwaccel_device to select which one to 
pass (fail or make an arbitrary choice if not present?).
  c)  If none, either make a default one or pass nothing (depends what works).
* Note that this decision logic is completely unaffected by whether the dummy 
hwaccel is specified or not.)

2)  Output / get_format
  a)  If the dummy hwaccel is specified, output in hwaccel_output_format or 
default to CUDA frames if not specified.
  b)  Otherwise, always output in normal memory.
* Alternatively, actually change the semantics and be clear about what the new 
ones are.

3)  Remove all of the CUVID-specific code from ffmpeg.


Point (1) pretty much already works without any change - see 
hw_device_setup_for_decode().  The only missing feature here is that the name 
matching doesn't actually work for CUVID (strstr("h264_cuvid", "cuda") -> 
NULL).  I'm not sure that's actually a problem, though - the CUVID decoder is 
happy to pull a new device instance out of nothing, so it isn't actually needed 
in the standalone device case.  A hack there could give slightly more 
consistent behaviour, though.

(An imagined solution to avoid the string matching completely was to add a 
field to AVCodec which contains the set of devices that it can use, but that's 
a more invasive library change that noone has pursued yet.)

Point (2) is somewhat more subtle.  The default hwaccel behaviour is made for 
the real hwaccels attached to the normal decoder, and won't do the right thing 
for the dummy ones.  The specific case that we strongly want to avoid is some 
normal setting where the output is downloaded to normal memory from a hardware 
frame inside ffmpeg, because that is almost certainly done more efficiently in 
the decoder itself (for the CUVID case, it actually has two explicit copies if 
you do this).  It's rare that you ever want to specify anything other than the 
hardware format or the corresponding software format for decode (and in fact I 
think only VAAPI supports such convert-on-download cases anyway), so the single 
toggle is usually sufficient.

Therefore, I think we should just clearly distinguish between the two general 
behaviours for the hwaccel case - real and dummy.  That's essentially what your 
patch at 
 did, 
but in a slightly implicit way - I would put it in hwaccel_decode_init() rather 
than in the option parsing code.

There was some question of whether all hwaccels (real and dummy) should behave 
identically here, but given the nasty default case if we do that I don't think 
it's justified (though feel free to argue further on this point if you feel 
strongly, I'm not that attached to it).

TL;DR - I preferred the mechanism of the previous version, with some changes to 
make it clearer what the distinction is.

Thanks,

- Mark


PS:  The libmfx behaviour unfortunately isn't really a very useful comparison, 
because the implementation is incomplete.  The important change with the 
generic code was that the string matching does the right thing to create a 
device for the non-hwaccel case.  The hwaccel case didn't change at all because 
hw_device_ctx + hardware frame output isn't supported in the decoder.  I don't 
have any plans to implement it soon because I regard the libmfx decoders as 
mostly deprecated at this point (use real hwaccel for the platform + hwmap for 
encode instead), but I wouldn't mind it being done by someone else if they care.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avfilter: remove usage of empty header

2017-06-25 Thread Paul B Mahol
Signed-off-by: Paul B Mahol 
---
 libavfilter/f_sendcmd.c | 1 -
 libavfilter/f_zmq.c | 1 -
 libavfilter/graphdump.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/libavfilter/f_sendcmd.c b/libavfilter/f_sendcmd.c
index 1e53465..b8740e8 100644
--- a/libavfilter/f_sendcmd.c
+++ b/libavfilter/f_sendcmd.c
@@ -30,7 +30,6 @@
 #include "libavutil/parseutils.h"
 #include "avfilter.h"
 #include "internal.h"
-#include "avfiltergraph.h"
 #include "audio.h"
 #include "video.h"
 
diff --git a/libavfilter/f_zmq.c b/libavfilter/f_zmq.c
index 2666016..89da5be 100644
--- a/libavfilter/f_zmq.c
+++ b/libavfilter/f_zmq.c
@@ -29,7 +29,6 @@
 #include "libavutil/opt.h"
 #include "avfilter.h"
 #include "internal.h"
-#include "avfiltergraph.h"
 #include "audio.h"
 #include "video.h"
 
diff --git a/libavfilter/graphdump.c b/libavfilter/graphdump.c
index 531bb57..7377719 100644
--- a/libavfilter/graphdump.c
+++ b/libavfilter/graphdump.c
@@ -25,7 +25,6 @@
 #include "libavutil/bprint.h"
 #include "libavutil/pixdesc.h"
 #include "avfilter.h"
-#include "avfiltergraph.h"
 #include "internal.h"
 
 static int print_link_prop(AVBPrint *buf, AVFilterLink *link)
-- 
2.9.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/2] RFC: Generic hwaccel for cuvid v2

2017-06-25 Thread Philip Langdale
On Sun, 25 Jun 2017 12:43:12 +0100
Mark Thompson  wrote:

> Point (2) is somewhat more subtle.  The default hwaccel behaviour is
> made for the real hwaccels attached to the normal decoder, and won't
> do the right thing for the dummy ones.  The specific case that we
> strongly want to avoid is some normal setting where the output is
> downloaded to normal memory from a hardware frame inside ffmpeg,
> because that is almost certainly done more efficiently in the decoder
> itself (for the CUVID case, it actually has two explicit copies if
> you do this).  It's rare that you ever want to specify anything other
> than the hardware format or the corresponding software format for
> decode (and in fact I think only VAAPI supports such
> convert-on-download cases anyway), so the single toggle is usually
> sufficient.
> 
> Therefore, I think we should just clearly distinguish between the two
> general behaviours for the hwaccel case - real and dummy.  That's
> essentially what your patch at
> 
> did, but in a slightly implicit way - I would put it in
> hwaccel_decode_init() rather than in the option parsing code.
> 
> There was some question of whether all hwaccels (real and dummy)
> should behave identically here, but given the nasty default case if
> we do that I don't think it's justified (though feel free to argue
> further on this point if you feel strongly, I'm not that attached to
> it).
> 
> TL;DR - I preferred the mechanism of the previous version, with some
> changes to make it clearer what the distinction is.

Makes sense. I'll post an updated diff later today. Thanks for
explaining all your thoughts on this.

--phil
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/2] RFC: Generic hwaccel for cuvid v2

2017-06-25 Thread Philip Langdale
On Sat, 24 Jun 2017 21:42:23 -0400
"Ronald S. Bultje"  wrote:

> Hi Philip,
> 
> On Sat, Jun 24, 2017 at 7:40 PM, Philip Langdale 
> wrote:
> 
> > Feedback on first proposal was lack of interest in having default
> > behaviour vary between hwaccels, despite their other differences.  
> 
> 
> I think that's because many of us - including me - don't really
> understand this.
> 
> I, for one, am totally happy to delegate this decision to you and
> trust you to come up with behaviour that improves the end used
> experience. Not out of disinterest, but out of ignorance.

Hi Ronald,

I phrased that a bit wrong, I meant that the feedback stated a lack of
interest, rather than the lack of interest (ie: silence) being the
feedback. I know that this isn't an area that most people have opinions
about. :-)

There was some discussion on IRC as well, so I got more active feedback
than might at first appear.

Thanks,

--phil
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementation

2017-06-25 Thread Ilia Valiakhmetov
vp9_diag_downright_32x32_12bpp_c: 429.7
vp9_diag_downright_32x32_12bpp_sse2: 158.9
vp9_diag_downright_32x32_12bpp_ssse3: 144.6
vp9_diag_downright_32x32_12bpp_avx: 141.0
vp9_diag_downright_32x32_12bpp_avx2: 73.8

Almost 50% faster than avx implementation
---
 libavcodec/x86/vp9dsp_init_16bpp.c|   6 +-
 libavcodec/x86/vp9intrapred_16bpp.asm | 103 +-
 2 files changed, 106 insertions(+), 3 deletions(-)

diff --git a/libavcodec/x86/vp9dsp_init_16bpp.c 
b/libavcodec/x86/vp9dsp_init_16bpp.c
index 8d1aa13..54216f0 100644
--- a/libavcodec/x86/vp9dsp_init_16bpp.c
+++ b/libavcodec/x86/vp9dsp_init_16bpp.c
@@ -52,8 +52,9 @@ decl_ipred_fns(dc,  16, mmxext, sse2);
 decl_ipred_fns(dc_top,  16, mmxext, sse2);
 decl_ipred_fns(dc_left, 16, mmxext, sse2);
 decl_ipred_fn(dl,   16, 16, avx2);
-decl_ipred_fn(dr,   16, 16, avx2);
 decl_ipred_fn(dl,   32, 16, avx2);
+decl_ipred_fn(dr,   16, 16, avx2);
+decl_ipred_fn(dr,   32, 16, avx2);
 
 #define decl_ipred_dir_funcs(type) \
 decl_ipred_fns(type, 16, sse2,  sse2); \
@@ -137,8 +138,9 @@ av_cold void ff_vp9dsp_init_16bpp_x86(VP9DSPContext *dsp)
 init_fpel_func(1, 1,  64, avg, _16, avx2);
 init_fpel_func(0, 1, 128, avg, _16, avx2);
 init_ipred_func(dl, DIAG_DOWN_LEFT, 16, 16, avx2);
-init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2);
 init_ipred_func(dl, DIAG_DOWN_LEFT, 32, 16, avx2);
+init_ipred_func(dr, DIAG_DOWN_RIGHT, 16, 16, avx2);
+init_ipred_func(dr, DIAG_DOWN_RIGHT, 32, 16, avx2);
 }
 
 #endif /* HAVE_YASM */
diff --git a/libavcodec/x86/vp9intrapred_16bpp.asm 
b/libavcodec/x86/vp9intrapred_16bpp.asm
index 6d4400b..32b6982 100644
--- a/libavcodec/x86/vp9intrapred_16bpp.asm
+++ b/libavcodec/x86/vp9intrapred_16bpp.asm
@@ -1221,8 +1221,109 @@ cglobal vp9_ipred_dr_16x16_16, 4, 5, 6, dst, stride, l, 
a
 mova  [dstq+strideq*0], m4 ; 0
 mova [dst3q+strideq*4], m5 ; 7
 RET
-%endif
 
+%if ARCH_X86_64
+cglobal vp9_ipred_dr_32x32_16, 4, 7, 10, dst, stride, l, a
+movam0, [lq+mmsize*0+0]; l[0-15]
+movam1, [lq+mmsize*1+0]; l[16-31]
+movum2, [aq+mmsize*0-2]; *abcdefghijklmno
+movam3, [aq+mmsize*0+0]; abcdefghijklmnop
+movam4, [aq+mmsize*1+0]; qrstuvwxyz012345
+vperm2i128  m5, m0, m1, q0201  ; lmnopqrstuvwxyz0
+vpalignrm6, m5, m0, 2  ; mnopqrstuvwxyz01
+vpalignrm7, m5, m0, 4  ; nopqrstuvwxyz012
+LOWPASS  0,  6,  7 ; L[0-15]
+vperm2i128  m7, m1, m2, q0201  ; stuvwxyz*abcdefg
+vpalignrm5, m7, m1, 2  ; lmnopqrstuvwxyz*
+vpalignrm6, m7, m1, 4  ; mnopqrstuvwxyz*a
+LOWPASS  1,  5,  6 ; L[16-31]#
+vperm2i128  m5, m3, m4, q0201  ; ijklmnopqrstuvwx
+vpalignrm6, m5, m3, 2  ; bcdefghijklmnopq
+LOWPASS  2,  3,  6 ; A[0-15]
+movum3, [aq+mmsize*1-2]; pqrstuvwxyz01234
+vperm2i128  m6, m4, m4, q2001  ; yz012345
+vpalignrm7, m6, m4, 2  ; rstuvwxyz012345.
+LOWPASS  3,  4,  7 ; A[16-31].
+vperm2i128  m4, m1, m2, q0201  ; TUVWXYZ#ABCDEFGH
+vperm2i128  m5, m0, m1, q0201  ; L[7-15]L[16-23]
+vperm2i128  m8, m2, m3, q0201  ; IJKLMNOPQRSTUVWX
+DEFINE_ARGS dst8, stride, stride3, stride7, stride5, dst24, cnt
+lea   stride3q, [strideq*3]
+lea   stride5q, [stride3q+strideq*2]
+lea   stride7q, [strideq*4+stride3q]
+lea dst24q, [dst8q+stride3q*8]
+lea  dst8q, [dst8q+strideq*8]
+mov   cntd, 2
+
+.loop:
+mova  [dst24q+stride7q+0 ], m0 ; 31 23 15 7
+mova  [dst24q+stride7q+32], m1
+mova[dst8q+stride7q+0], m1
+mova   [dst8q+stride7q+32], m2
+vpalignrm6, m4, m1, 2
+vpalignrm7, m5, m0, 2
+vpalignrm9, m8, m2, 2
+mova [dst24q+stride3q*2+0], m7 ; 30 22 14 6
+mova [dst24q+stride3q*2+32], m6
+mova  [dst8q+stride3q*2+0], m6
+mova [dst8q+stride3q*2+32], m9
+vpalignrm6, m4, m1, 4
+vpalignrm7, m5, m0, 4
+vpalignrm9, m8, m2, 4
+mova   [dst24q+stride5q+0], m7 ; 29 21 13 5
+mova  [dst24q+stride5q+32], m6
+mova[dst8q+stride5q+0], m6
+mova   [dst8q+stride5q+32], m9
+vpalignrm6, m4, m1, 6
+v

[FFmpeg-devel] [PATCH] avcodec/prores_kostya: increase bits usage when alpha is used

2017-06-25 Thread Paul B Mahol
Also fix undefined left shift of negative variable.

Signed-off-by: Paul B Mahol 
---
 libavcodec/proresenc_kostya.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/libavcodec/proresenc_kostya.c b/libavcodec/proresenc_kostya.c
index 090dfa5..25f7fcb 100644
--- a/libavcodec/proresenc_kostya.c
+++ b/libavcodec/proresenc_kostya.c
@@ -358,7 +358,7 @@ static inline void encode_vlc_codeword(PutBitContext *pb, 
unsigned codebook, int
 }
 
 #define GET_SIGN(x)  ((x) >> 31)
-#define MAKE_CODE(x) (((x) << 1) ^ GET_SIGN(x))
+#define MAKE_CODE(x) x)) * 2) ^ GET_SIGN(x))
 
 static void encode_dcs(PutBitContext *pb, int16_t *blocks,
int blocks_per_slice, int scale)
@@ -1206,6 +1206,8 @@ FF_ENABLE_DEPRECATION_WARNINGS
ctx->pictures_per_frame)
 break;
 ctx->bits_per_mb   = ctx->profile_info->br_tab[i];
+if (ctx->alpha_bits)
+ctx->bits_per_mb *= 20;
 } else if (ctx->bits_per_mb < 128) {
 av_log(avctx, AV_LOG_ERROR, "too few bits per MB, please set at 
least 128\n");
 return AVERROR_INVALIDDATA;
-- 
2.9.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/prores_kostya: increase bits usage when alpha is used

2017-06-25 Thread Rostislav Pehlivanov
On 25 June 2017 at 16:38, Paul B Mahol  wrote:

> Also fix undefined left shift of negative variable.
>
> Signed-off-by: Paul B Mahol 
> ---
>  libavcodec/proresenc_kostya.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/libavcodec/proresenc_kostya.c b/libavcodec/proresenc_kostya.c
> index 090dfa5..25f7fcb 100644
> --- a/libavcodec/proresenc_kostya.c
> +++ b/libavcodec/proresenc_kostya.c
> @@ -358,7 +358,7 @@ static inline void encode_vlc_codeword(PutBitContext
> *pb, unsigned codebook, int
>  }
>
>  #define GET_SIGN(x)  ((x) >> 31)
> -#define MAKE_CODE(x) (((x) << 1) ^ GET_SIGN(x))
> +#define MAKE_CODE(x) x)) * 2) ^ GET_SIGN(x))
>
>  static void encode_dcs(PutBitContext *pb, int16_t *blocks,
> int blocks_per_slice, int scale)
> @@ -1206,6 +1206,8 @@ FF_ENABLE_DEPRECATION_WARNINGS
> ctx->pictures_per_frame)
>  break;
>  ctx->bits_per_mb   = ctx->profile_info->br_tab[i];
> +if (ctx->alpha_bits)
> +ctx->bits_per_mb *= 20;
>  } else if (ctx->bits_per_mb < 128) {
>  av_log(avctx, AV_LOG_ERROR, "too few bits per MB, please set
> at least 128\n");
>  return AVERROR_INVALIDDATA;
> --
> 2.9.3
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

Makes sense, LGTM
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 10/11] avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high depth functions

2017-06-25 Thread Michael Niedermayer
On Sat, Jun 24, 2017 at 06:30:26PM -0400, Ronald S. Bultje wrote:
> Hi,
> 
> On Sat, Jun 24, 2017 at 3:27 PM, Michael Niedermayer  > wrote:
> 
> > On Mon, Jun 19, 2017 at 05:11:03PM +0200, James Darnley wrote:
> > > Includes add/put functions
> > >
> > > Rounding contributed by Ronald S. Bultje
> > > ---
> > >  libavcodec/tests/x86/dct.c|  2 +
> > >  libavcodec/x86/idctdsp_init.c | 23 
> > >  libavcodec/x86/simple_idct.h  |  9 +++
> > >  libavcodec/x86/simple_idct10.asm  | 92
> > +++
> > >  libavcodec/x86/simple_idct10_template.asm |  6 +-
> > >  5 files changed, 130 insertions(+), 2 deletions(-)
> >
> > Sorry for the delay, testing this took me a bit longer than expected
> > as many files change slightly and looking at differences manually
> > is manual work ...
> >
> 
> Understood, and thanks for taking the time to do the testing.
> 
> 
> > This patch changes the default IDCT on x86(64), which is intended IIUC
> > It also changes the IDCT when simplemmx is set
> >
> > but on x86-32 simplemmx does after this patch not produce the same
> > result as simplemmx on x86-64.
> >
> > iam not sure but
> > maybe the changed code should enable on FF_IDCT_SIMPLE instead of
> > FF_IDCT_SIMPLEMMX ?
> > whats your oppinion on this ?
> > the next patch would add FF_IDCT_SIMPLE but it also leaves
> > FF_IDCT_SIMPLEMMX
> 
> 
>  That's a good point, I also considered that question (not so much the
> 32bit vs. 64bit, but the mmx vs. sse2). The question is basically what
> simplemmx means. Is it the exact binary result of the mmx function? Or is
> it a way of saying "almost simple, but with some rounding diffs because
> mmx"?
> 
> If the second, then simple is a superset of simplemmx. If the first, then
> we should remove simplemmx from the list of "supported" idcts for the
> sse2/avx functions. I have no preference (I assumed it meant the first),
> but if you'd prefer to use the second meaning, then that's an easy
> modification to make and it won't practically have any impact for most use
> cases I think...

I didnt think about meaning, rather more about practice.
if someone reports any issue using "simplemmx" and bitexact and
that fails to be reproduced it could be confusing.
This is especially plausible when the bug is not idct rounding but
a bug in a later stage just triggered by specific output from the idct

also potential future fate tests of simplemmx or other simd idcts
require there to be a way to select a specific idct output

no strong oppinion on this ...

[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you fake or manipulate statistics in a paper in physics you will never
get a job again.
If you fake or manipulate statistics in a paper in medicin you will get
a job for life at the pharma industry.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v3 0/5] Interplay MVE: Implement additional frame formats

2017-06-25 Thread Hein-Pieter van Braam
Changes since V1:
 * Implemented fixes suggested by Moritz Barsnick (if/else whitespace)
 * Strict checking on overread of the IP packet data
 * Fixed checking the size of the IP packet header size (6 vs 8)

Changes since V2:
 * Correct the header size check found by Paul B Mahol

This patch series implement the previously unknown opcodes 0x06, 0x0E,
and 0x010 for Interplay MVE movies.

With this series every MVE sample[0] plays correctly, but I have some
doubts as to the implementation of the extra two AVFrames necessary for
format 0x10. If someone has a better idea on how to achieve this I'll
happily rework it as necessary.

I've tried to split up the patchset logically, I hope this makes sense.

I'm working with Multimedia Mike to get detailed information on the frame
formats added to multimedia.cx

If preferred the patchset is also available on github[1]

- HP

[0] http://samples.mplayerhq.hu/game-formats/interplay-mve/
[1] https://github.com/hpvb/FFmpeg/tree/interplay-mve-submit

[PATCH v2 1/5] Interplay MVE: Implement MVE SEND_BUFFER operation
 This patch implements a feature that wasn't used for 0x11 formatted movies.
 See the commit message for details.

[PATCH v2 2/5] Interplay MVE: Refactor IP packet format
 This patch changes the IP packet format for interplaymovie. The reason for
 this is that frame format 0x10 requires an extra stream of data for a total
 of three to work correctly. This generalizes that for all frame formats.

[PATCH v2 3/5] Interplay MVE: Implement frame format 0x06
[PATCH v2 4/5] Interplay MVE: Implement frame format 0x10
[PATCH v2 5/5] Interplay MVE: Changelog entry for changes

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v3 1/5] Interplay MVE: Implement MVE SEND_BUFFER operation

2017-06-25 Thread Hein-Pieter van Braam
Interplay MVE movies have a SEND_BUFFER operation. Only after this
command does the current decoding buffer get displayed. This is required
for the other frame formats. They are fixed-size and can't always encode
a full frame worth of pixeldata.

This code prevents half-finished frames from being emitted.

Signed-off-by: Hein-Pieter van Braam 
---
 libavcodec/interplayvideo.c | 15 +--
 libavformat/ipmovie.c   | 16 +++-
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/libavcodec/interplayvideo.c b/libavcodec/interplayvideo.c
index df3314d..7c69926 100644
--- a/libavcodec/interplayvideo.c
+++ b/libavcodec/interplayvideo.c
@@ -990,17 +990,20 @@ static int ipvideo_decode_frame(AVCodecContext *avctx,
 IpvideoContext *s = avctx->priv_data;
 AVFrame *frame = data;
 int ret;
+int send_buffer;
 
 if (av_packet_get_side_data(avpkt, AV_PKT_DATA_PARAM_CHANGE, NULL)) {
 av_frame_unref(s->last_frame);
 av_frame_unref(s->second_last_frame);
 }
 
-if (buf_size < 2)
+if (buf_size < 3)
 return AVERROR_INVALIDDATA;
 
+send_buffer = AV_RL8(avpkt->data);
+
 /* decoding map contains 4 bits of information per 8x8 block */
-s->decoding_map_size = AV_RL16(avpkt->data);
+s->decoding_map_size = AV_RL16(avpkt->data + 1);
 
 /* compressed buffer needs to be large enough to at least hold an entire
  * decoding map */
@@ -1008,9 +1011,9 @@ static int ipvideo_decode_frame(AVCodecContext *avctx,
 return buf_size;
 
 
-s->decoding_map = buf + 2;
-bytestream2_init(&s->stream_ptr, buf + 2 + s->decoding_map_size,
- buf_size - s->decoding_map_size);
+s->decoding_map = buf + 3;
+bytestream2_init(&s->stream_ptr, buf + 3 + s->decoding_map_size,
+ buf_size - s->decoding_map_size - 3);
 
 if ((ret = ff_get_buffer(avctx, frame, AV_GET_BUFFER_FLAG_REF)) < 0)
 return ret;
@@ -1028,7 +1031,7 @@ static int ipvideo_decode_frame(AVCodecContext *avctx,
 
 ipvideo_decode_opcodes(s, frame);
 
-*got_frame = 1;
+*got_frame = send_buffer;
 
 /* shuffle frames */
 av_frame_unref(s->second_last_frame);
diff --git a/libavformat/ipmovie.c b/libavformat/ipmovie.c
index 29eeaf6..0705d33 100644
--- a/libavformat/ipmovie.c
+++ b/libavformat/ipmovie.c
@@ -91,6 +91,7 @@ typedef struct IPMVEContext {
 uint32_t palette[256];
 int  has_palette;
 int  changed;
+uint8_t  send_buffer;
 
 unsigned int audio_bits;
 unsigned int audio_channels;
@@ -154,9 +155,9 @@ static int load_ipmovie_packet(IPMVEContext *s, AVIOContext 
*pb,
 
 } else if (s->decode_map_chunk_offset) {
 
-/* send both the decode map and the video data together */
+/* send the decode map, the video data, and the send_buffer flag 
together */
 
-if (av_new_packet(pkt, 2 + s->decode_map_chunk_size + 
s->video_chunk_size))
+if (av_new_packet(pkt, 3 + s->decode_map_chunk_size + 
s->video_chunk_size))
 return CHUNK_NOMEM;
 
 if (s->has_palette) {
@@ -178,8 +179,11 @@ static int load_ipmovie_packet(IPMVEContext *s, 
AVIOContext *pb,
 avio_seek(pb, s->decode_map_chunk_offset, SEEK_SET);
 s->decode_map_chunk_offset = 0;
 
-AV_WL16(pkt->data, s->decode_map_chunk_size);
-if (avio_read(pb, pkt->data + 2, s->decode_map_chunk_size) !=
+AV_WL8(pkt->data, s->send_buffer);
+s->send_buffer = 0;
+
+AV_WL16(pkt->data + 1, s->decode_map_chunk_size);
+if (avio_read(pb, pkt->data + 3, s->decode_map_chunk_size) !=
 s->decode_map_chunk_size) {
 av_packet_unref(pkt);
 return CHUNK_EOF;
@@ -188,7 +192,7 @@ static int load_ipmovie_packet(IPMVEContext *s, AVIOContext 
*pb,
 avio_seek(pb, s->video_chunk_offset, SEEK_SET);
 s->video_chunk_offset = 0;
 
-if (avio_read(pb, pkt->data + 2 + s->decode_map_chunk_size,
+if (avio_read(pb, pkt->data + 3 + s->decode_map_chunk_size,
 s->video_chunk_size) != s->video_chunk_size) {
 av_packet_unref(pkt);
 return CHUNK_EOF;
@@ -444,6 +448,7 @@ static int process_ipmovie_chunk(IPMVEContext *s, 
AVIOContext *pb,
 case OPCODE_SEND_BUFFER:
 av_log(s->avf, AV_LOG_TRACE, "send buffer\n");
 avio_skip(pb, opcode_size);
+s->send_buffer = 1;
 break;
 
 case OPCODE_AUDIO_FRAME:
@@ -590,6 +595,7 @@ static int ipmovie_read_header(AVFormatContext *s)
 ipmovie->video_pts = ipmovie->audio_frame_count = 0;
 ipmovie->audio_chunk_offset = ipmovie->video_chunk_offset =
 ipmovie->decode_map_chunk_offset = 0;
+ipmovie->send_buffer = 0;
 
 /* on the first read, this will position the stream at the first chunk */
 ipmovie->next_chunk_offset = avio_tell(pb) + 4;
-- 
2.9.4

___
ffmpeg-devel mailing list
ffmpeg-devel@

[FFmpeg-devel] [PATCH v3 2/5] Interplay MVE: Refactor IP packet format

2017-06-25 Thread Hein-Pieter van Braam
Interplay MVE can contain up to three different frame formats. They
require different streams of information to render a frame. This patch
changes the IP packet format to prepare for the extra frame formats.

Signed-off-by: Hein-Pieter van Braam 
---
 libavcodec/interplayvideo.c | 33 +---
 libavformat/ipmovie.c   | 46 -
 2 files changed, 51 insertions(+), 28 deletions(-)

diff --git a/libavcodec/interplayvideo.c b/libavcodec/interplayvideo.c
index 7c69926..5dfb0d6 100644
--- a/libavcodec/interplayvideo.c
+++ b/libavcodec/interplayvideo.c
@@ -991,29 +991,40 @@ static int ipvideo_decode_frame(AVCodecContext *avctx,
 AVFrame *frame = data;
 int ret;
 int send_buffer;
+int frame_format;
+int video_data_size;
 
 if (av_packet_get_side_data(avpkt, AV_PKT_DATA_PARAM_CHANGE, NULL)) {
 av_frame_unref(s->last_frame);
 av_frame_unref(s->second_last_frame);
 }
 
-if (buf_size < 3)
+if (buf_size < 6)
 return AVERROR_INVALIDDATA;
 
-send_buffer = AV_RL8(avpkt->data);
+frame_format = AV_RL8(buf);
+send_buffer  = AV_RL8(buf + 1);
+video_data_size  = AV_RL16(buf + 2);
+s->decoding_map_size = AV_RL16(buf + 4);
 
-/* decoding map contains 4 bits of information per 8x8 block */
-s->decoding_map_size = AV_RL16(avpkt->data + 1);
+if (frame_format != 0x11)
+av_log(avctx, AV_LOG_ERROR, "Frame type 0x%02X unsupported\n", 
frame_format);
 
-/* compressed buffer needs to be large enough to at least hold an entire
- * decoding map */
-if (buf_size < s->decoding_map_size + 2)
-return buf_size;
+if (! s->decoding_map_size) {
+av_log(avctx, AV_LOG_ERROR, "Empty decoding map\n");
+return AVERROR_INVALIDDATA;
+}
 
+bytestream2_init(&s->stream_ptr, buf + 6, video_data_size);
 
-s->decoding_map = buf + 3;
-bytestream2_init(&s->stream_ptr, buf + 3 + s->decoding_map_size,
- buf_size - s->decoding_map_size - 3);
+/* decoding map contains 4 bits of information per 8x8 block */
+s->decoding_map = buf + 6 + video_data_size;
+
+/* ensure we can't overread the packet */
+if (buf_size < 6 + s->decoding_map_size + video_data_size) {
+av_log(avctx, AV_LOG_ERROR, "Invalid IP packet size\n");
+return AVERROR_INVALIDDATA;
+}
 
 if ((ret = ff_get_buffer(avctx, frame, AV_GET_BUFFER_FLAG_REF)) < 0)
 return ret;
diff --git a/libavformat/ipmovie.c b/libavformat/ipmovie.c
index 0705d33..a9ffca4 100644
--- a/libavformat/ipmovie.c
+++ b/libavformat/ipmovie.c
@@ -69,7 +69,7 @@
 #define OPCODE_UNKNOWN_0E  0x0E
 #define OPCODE_SET_DECODING_MAP0x0F
 #define OPCODE_UNKNOWN_10  0x10
-#define OPCODE_VIDEO_DATA  0x11
+#define OPCODE_VIDEO_DATA_11   0x11
 #define OPCODE_UNKNOWN_12  0x12
 #define OPCODE_UNKNOWN_13  0x13
 #define OPCODE_UNKNOWN_14  0x14
@@ -92,6 +92,7 @@ typedef struct IPMVEContext {
 int  has_palette;
 int  changed;
 uint8_t  send_buffer;
+uint8_t  frame_format;
 
 unsigned int audio_bits;
 unsigned int audio_channels;
@@ -153,11 +154,11 @@ static int load_ipmovie_packet(IPMVEContext *s, 
AVIOContext *pb,
 
 chunk_type = CHUNK_VIDEO;
 
-} else if (s->decode_map_chunk_offset) {
+} else if (s->frame_format) {
 
-/* send the decode map, the video data, and the send_buffer flag 
together */
+/* send the frame format, decode map, the video data, and the 
send_buffer flag together */
 
-if (av_new_packet(pkt, 3 + s->decode_map_chunk_size + 
s->video_chunk_size))
+if (av_new_packet(pkt, 6 + s->decode_map_chunk_size + 
s->video_chunk_size))
 return CHUNK_NOMEM;
 
 if (s->has_palette) {
@@ -175,29 +176,38 @@ static int load_ipmovie_packet(IPMVEContext *s, 
AVIOContext *pb,
 ff_add_param_change(pkt, 0, 0, 0, s->video_width, s->video_height);
 s->changed = 0;
 }
-pkt->pos= s->decode_map_chunk_offset;
-avio_seek(pb, s->decode_map_chunk_offset, SEEK_SET);
-s->decode_map_chunk_offset = 0;
 
-AV_WL8(pkt->data, s->send_buffer);
+AV_WL8(pkt->data, s->frame_format);
+AV_WL8(pkt->data + 1, s->send_buffer);
+AV_WL16(pkt->data + 2, s->video_chunk_size);
+AV_WL16(pkt->data + 4, s->decode_map_chunk_size);
+
+s->frame_format = 0;
 s->send_buffer = 0;
 
-AV_WL16(pkt->data + 1, s->decode_map_chunk_size);
-if (avio_read(pb, pkt->data + 3, s->decode_map_chunk_size) !=
-s->decode_map_chunk_size) {
+pkt->pos = s->video_chunk_offset;
+avio_seek(pb, s->video_chunk_offset, SEEK_SET);
+s->video_chunk_offset = 0;
+
+if (avio_read(pb, pkt->data + 6, s->video_chunk_size) !=
+s->video_chunk_size) {

[FFmpeg-devel] [PATCH v3 3/5] Interplay MVE: Implement frame format 0x06

2017-06-25 Thread Hein-Pieter van Braam
This implements the 0x06 frame format for Interplay MVE movies. The
format is relatively simple. The video data consists of two parts:

16 bits per 8x8 block movement data
a number of 8x8 blocks of pixel data

For each 8x8 block of pixel data the movement data is consulted. There
are 3 possible meanings of the movement data:
* zero : copy the 8x8 block from the pixel data
* negative : copy the 8x8 block from the previous frame from an offset
 determined by the actual value of the entry -0xC000.
* positive : copy the 8x8 block from the current frame from an offset
 determined by the actual value of the entry -0x4000

Decoding happens in two passes, in the fist pass only new pixeldata is
copied, during the second pass data is copied from the previous and
current frames.

The codec expects that the current frame being decoded to still has the
data from 2 frames ago on it when decoding starts.

Signed-off-by: Hein-Pieter van Braam 
---
 libavcodec/interplayvideo.c | 125 
 libavformat/ipmovie.c   |  15 --
 2 files changed, 126 insertions(+), 14 deletions(-)

diff --git a/libavcodec/interplayvideo.c b/libavcodec/interplayvideo.c
index 5dfb0d6..431eeb1 100644
--- a/libavcodec/interplayvideo.c
+++ b/libavcodec/interplayvideo.c
@@ -903,7 +903,81 @@ static int (* const 
ipvideo_decode_block16[])(IpvideoContext *s, AVFrame *frame)
 ipvideo_decode_block_opcode_0xE_16, ipvideo_decode_block_opcode_0x1,
 };
 
-static void ipvideo_decode_opcodes(IpvideoContext *s, AVFrame *frame)
+static void ipvideo_format_06_firstpass(IpvideoContext *s, AVFrame *frame, 
short opcode)
+{
+int line;
+
+if (!opcode) {
+for (line = 0; line < 8; ++line) {
+bytestream2_get_buffer(&s->stream_ptr, s->pixel_ptr, 8);
+s->pixel_ptr += s->stride;
+}
+} else {
+/* Don't try to copy second_last_frame data on the first frames */
+if (s->avctx->frame_number > 2)
+copy_from(s, s->second_last_frame, frame, 0, 0);
+}
+}
+
+static void ipvideo_format_06_secondpass(IpvideoContext *s, AVFrame *frame, 
short opcode)
+{
+int off_x, off_y;
+
+if (opcode < 0) {
+off_x = ((unsigned short)opcode - 0xC000) % frame->linesize[0];
+off_y = ((unsigned short)opcode - 0xC000) / frame->linesize[0];
+copy_from(s, s->last_frame, frame, off_x, off_y);
+} else if (opcode > 0) {
+off_x = ((unsigned short)opcode - 0x4000) % frame->linesize[0];
+off_y = ((unsigned short)opcode - 0x4000) / frame->linesize[0];
+copy_from(s, frame, frame, off_x, off_y);
+}
+}
+
+static void (* const ipvideo_format_06_passes[])(IpvideoContext *s, AVFrame 
*frame, short op) = {
+ipvideo_format_06_firstpass, ipvideo_format_06_secondpass,
+};
+
+static void ipvideo_decode_format_06_opcodes(IpvideoContext *s, AVFrame *frame)
+{
+int pass, x, y;
+short opcode;
+GetByteContext decoding_map_ptr;
+
+/* this is PAL8, so make the palette available */
+memcpy(frame->data[1], s->pal, AVPALETTE_SIZE);
+s->stride = frame->linesize[0];
+
+s->line_inc = s->stride - 8;
+s->upper_motion_limit_offset = (s->avctx->height - 8) * frame->linesize[0]
+  + (s->avctx->width - 8) * (1 + s->is_16bpp);
+
+bytestream2_init(&decoding_map_ptr, s->decoding_map, s->decoding_map_size);
+
+for (pass = 0; pass < 2; ++pass) {
+bytestream2_seek(&decoding_map_ptr, 0, SEEK_SET);
+for (y = 0; y < s->avctx->height; y += 8) {
+for (x = 0; x < s->avctx->width; x += 8) {
+opcode = bytestream2_get_le16(&decoding_map_ptr);
+
+ff_tlog(s->avctx,
+"  block @ (%3d, %3d): opcode 0x%X, data ptr offset 
%d\n",
+x, y, opcode, bytestream2_tell(&s->stream_ptr));
+
+s->pixel_ptr = frame->data[0] + x + y * frame->linesize[0];
+ipvideo_format_06_passes[pass](s, frame, opcode);
+}
+}
+}
+
+if (bytestream2_get_bytes_left(&s->stream_ptr) > 1) {
+av_log(s->avctx, AV_LOG_DEBUG,
+   "decode finished with %d bytes left over\n",
+   bytestream2_get_bytes_left(&s->stream_ptr));
+}
+}
+
+static void ipvideo_decode_format_11_opcodes(IpvideoContext *s, AVFrame *frame)
 {
 int x, y;
 unsigned char opcode;
@@ -1007,18 +1081,40 @@ static int ipvideo_decode_frame(AVCodecContext *avctx,
 video_data_size  = AV_RL16(buf + 2);
 s->decoding_map_size = AV_RL16(buf + 4);
 
-if (frame_format != 0x11)
-av_log(avctx, AV_LOG_ERROR, "Frame type 0x%02X unsupported\n", 
frame_format);
+switch(frame_format) {
+case 0x06:
+if (s->decoding_map_size) {
+av_log(avctx, AV_LOG_ERROR, "Decoding map for format 0x06\n");
+return AVERROR_INVALIDDATA;
+}
 
-if (! s->decoding_map_size) {
-

[FFmpeg-devel] [PATCH v3 5/5] Interplay MVE: Changelog entry for changes

2017-06-25 Thread Hein-Pieter van Braam
Signed-off-by: Hein-Pieter van Braam 
---
 Changelog | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Changelog b/Changelog
index 4f46eda..24d2255 100644
--- a/Changelog
+++ b/Changelog
@@ -24,6 +24,7 @@ version :
 - roberts video filter
 - The x86 assembler default switched from yasm to nasm, pass
   --x86asmexe=yasm to configure to restore the old behavior.
+- additional frame format support for Interplay MVE movies
 
 version 3.3:
 - CrystalHD decoder moved to new decode API
-- 
2.9.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH v3 4/5] Interplay MVE: Implement frame format 0x10

2017-06-25 Thread Hein-Pieter van Braam
This implements the 0x10 frame format for Interplay MVE movies. The
format is a variation on the 0x06 format with some changes. In addition
to the decoding map there's also a skip map. This skip map is used to
determine what 8x8 blocks can change in a particular frame.

This format expects to be able to copy an 8x8 block from before the last
time it was changed. This can be an arbitrary time in the past. In order
to implement this this decoder allocates two additional AVFrames where
actual decoding happens. At the end of a frame decoding changed blocks
are copied to a finished frame based on the skip map.

The skip map's encoding is a little convulted, I'll refer to the code
for details.

Values in the decoding map are the same as in format 0x06.

Signed-off-by: Hein-Pieter van Braam 
---
 libavcodec/interplayvideo.c | 182 ++--
 libavformat/ipmovie.c   |  68 +
 2 files changed, 227 insertions(+), 23 deletions(-)

diff --git a/libavcodec/interplayvideo.c b/libavcodec/interplayvideo.c
index 431eeb1..421de26 100644
--- a/libavcodec/interplayvideo.c
+++ b/libavcodec/interplayvideo.c
@@ -55,8 +55,15 @@ typedef struct IpvideoContext {
 HpelDSPContext hdsp;
 AVFrame *second_last_frame;
 AVFrame *last_frame;
+
+/* For format 0x10 */
+AVFrame *cur_decode_frame;
+AVFrame *prev_decode_frame;
+
 const unsigned char *decoding_map;
 int decoding_map_size;
+const unsigned char *skip_map;
+int skip_map_size;
 
 int is_16bpp;
 GetByteContext stream_ptr, mv_ptr;
@@ -977,6 +984,114 @@ static void 
ipvideo_decode_format_06_opcodes(IpvideoContext *s, AVFrame *frame)
 }
 }
 
+static void ipvideo_format_10_firstpass(IpvideoContext *s, AVFrame *frame, 
short opcode)
+{
+int line;
+
+if (!opcode) {
+for (line = 0; line < 8; ++line) {
+bytestream2_get_buffer(&s->stream_ptr, s->pixel_ptr, 8);
+s->pixel_ptr += s->stride;
+}
+}
+}
+
+static void ipvideo_format_10_secondpass(IpvideoContext *s, AVFrame *frame, 
short opcode)
+{
+int off_x, off_y;
+
+if (opcode < 0) {
+off_x = ((unsigned short)opcode - 0xC000) % 
s->cur_decode_frame->linesize[0];
+off_y = ((unsigned short)opcode - 0xC000) / 
s->cur_decode_frame->linesize[0];
+copy_from(s, s->prev_decode_frame, s->cur_decode_frame, off_x, off_y);
+} else if (opcode > 0) {
+off_x = ((unsigned short)opcode - 0x4000) % 
s->cur_decode_frame->linesize[0];
+off_y = ((unsigned short)opcode - 0x4000) / 
s->cur_decode_frame->linesize[0];
+copy_from(s, s->cur_decode_frame, s->cur_decode_frame, off_x, off_y);
+}
+}
+
+static void (* const ipvideo_format_10_passes[])(IpvideoContext *s, AVFrame 
*frame, short op) = {
+ipvideo_format_10_firstpass, ipvideo_format_10_secondpass,
+};
+
+static void ipvideo_decode_format_10_opcodes(IpvideoContext *s, AVFrame *frame)
+{
+int pass, x, y, changed_block;
+short opcode, skip;
+GetByteContext decoding_map_ptr;
+GetByteContext skip_map_ptr;
+
+bytestream2_skip(&s->stream_ptr, 14); /* data starts 14 bytes in */
+
+/* this is PAL8, so make the palette available */
+memcpy(frame->data[1], s->pal, AVPALETTE_SIZE);
+s->stride = frame->linesize[0];
+
+s->line_inc = s->stride - 8;
+s->upper_motion_limit_offset = (s->avctx->height - 8) * frame->linesize[0]
+  + (s->avctx->width - 8) * (1 + s->is_16bpp);
+
+bytestream2_init(&decoding_map_ptr, s->decoding_map, s->decoding_map_size);
+bytestream2_init(&skip_map_ptr, s->skip_map, s->skip_map_size);
+
+for (pass = 0; pass < 2; ++pass) {
+bytestream2_seek(&decoding_map_ptr, 0, SEEK_SET);
+bytestream2_seek(&skip_map_ptr, 0, SEEK_SET);
+skip = bytestream2_get_le16(&skip_map_ptr);
+
+for (y = 0; y < s->avctx->height; y += 8) {
+for (x = 0; x < s->avctx->width; x += 8) {
+s->pixel_ptr = s->cur_decode_frame->data[0] + x + y * 
s->cur_decode_frame->linesize[0];
+
+while (skip <= 0)  {
+if (skip != -0x8000 && skip) {
+opcode = bytestream2_get_le16(&decoding_map_ptr);
+ipvideo_format_10_passes[pass](s, frame, opcode);
+break;
+}
+skip = bytestream2_get_le16(&skip_map_ptr);
+}
+skip *= 2;
+}
+}
+}
+
+bytestream2_seek(&skip_map_ptr, 0, SEEK_SET);
+skip = bytestream2_get_le16(&skip_map_ptr);
+for (y = 0; y < s->avctx->height; y += 8) {
+for (x = 0; x < s->avctx->width; x += 8) {
+changed_block = 0;
+s->pixel_ptr = frame->data[0] + x + y*frame->linesize[0];
+
+while (skip <= 0)  {
+if (skip != -0x8000 && skip) {
+changed_block = 1;
+break;
+}
+

[FFmpeg-devel] [PATCH 1/1] avformat: Fix Pro-MPEG non-square matrix

2017-06-25 Thread Vlad Tarca

The patch is correct, please apply.

Reviewed-by:vta...@mobibase.com 


--

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 1/1] avformat: Fix Pro-MPEG non-square matrix

2017-06-25 Thread Vlad Tarca

Sorry, gmail refused to send the In-Reply-To header.

The concerned patch is:

https://patchwork.ffmpeg.org/patch/2206/

This is a confirmed error, patch can be applied. Thanks!


On 25/06/2017 22:47, Vlad Tarca wrote:

The patch is correct, please apply.

Reviewed-by:vta...@mobibase.com




--

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/2] ffmpeg: Set default output format for dummy hwaccels

2017-06-25 Thread Philip Langdale
Dummy hwaccels, of which cuvid is the best example, behave differently
from real hwaccels. In the past, one of these behaviours was that
selecting the hwaccel would automatically cause the decoder (and
remember it's a dedicated decoder) to use the native hardware output
format.

This meant that transcoding command lines would pass frames through
device memory automatically once the right hwaccel and decoder/encoder
were selected.

With the generic hwaccel code path, dummy decoders end up following
the 'real' hwaccel path where the output format defaults to a software
format.

To avoid users facing an unexpected change in behaviour, we now
indicate whether an hwaccel is a dummy, and if it is, we set the
default output format appropriately.

To make this process easier, I updated ffmpeg to pass the HWAccel
struct to the init() call so that we know which decoder we are
dealing with.

Signed-off-by: Philip Langdale 
---
 ffmpeg.c  |  2 +-
 ffmpeg.h  | 15 +++
 ffmpeg_dxva2.c|  2 +-
 ffmpeg_hw.c   |  5 -
 ffmpeg_opt.c  | 21 ++---
 ffmpeg_qsv.c  |  2 +-
 ffmpeg_videotoolbox.c |  2 +-
 7 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/ffmpeg.c b/ffmpeg.c
index 6dae6e9078..32d8ba4895 100644
--- a/ffmpeg.c
+++ b/ffmpeg.c
@@ -2833,7 +2833,7 @@ static enum AVPixelFormat get_format(AVCodecContext *s, 
const enum AVPixelFormat
 (ist->hwaccel_id != HWACCEL_AUTO && ist->hwaccel_id != 
hwaccel->id))
 continue;
 
-ret = hwaccel->init(s);
+ret = hwaccel->init(hwaccel, s);
 if (ret < 0) {
 if (ist->hwaccel_id == hwaccel->id) {
 av_log(NULL, AV_LOG_FATAL,
diff --git a/ffmpeg.h b/ffmpeg.h
index fa81427471..946bcb75f2 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -70,13 +70,20 @@ enum HWAccelID {
 HWACCEL_CUVID,
 };
 
-typedef struct HWAccel {
+typedef struct HWAccel HWAccel;
+
+struct HWAccel {
 const char *name;
-int (*init)(AVCodecContext *s);
+int (*init)(const HWAccel *hwaccel, AVCodecContext *s);
 enum HWAccelID id;
 enum AVPixelFormat pix_fmt;
 enum AVHWDeviceType device_type;
-} HWAccel;
+/*
+ * A dummy hwaccel is one which maps to a separate decoder rather
+ * than plugging into a standard decoder
+ */
+int is_dummy;
+};
 
 typedef struct HWDevice {
 char *name;
@@ -673,6 +680,6 @@ void hw_device_free_all(void);
 int hw_device_setup_for_decode(InputStream *ist);
 int hw_device_setup_for_encode(OutputStream *ost);
 
-int hwaccel_decode_init(AVCodecContext *avctx);
+int hwaccel_decode_init(const HWAccel *hwaccel, AVCodecContext *avctx);
 
 #endif /* FFMPEG_H */
diff --git a/ffmpeg_dxva2.c b/ffmpeg_dxva2.c
index 1a391f82f3..8cbf5a9dff 100644
--- a/ffmpeg_dxva2.c
+++ b/ffmpeg_dxva2.c
@@ -406,7 +406,7 @@ fail:
 return AVERROR(EINVAL);
 }
 
-int dxva2_init(AVCodecContext *s)
+int dxva2_init(const HWAccel *hwaccel, AVCodecContext *s)
 {
 InputStream *ist = s->opaque;
 int loglevel = (ist->hwaccel_id == HWACCEL_AUTO) ? AV_LOG_VERBOSE : 
AV_LOG_ERROR;
diff --git a/ffmpeg_hw.c b/ffmpeg_hw.c
index a4d1cada59..8518f267be 100644
--- a/ffmpeg_hw.c
+++ b/ffmpeg_hw.c
@@ -375,11 +375,14 @@ fail:
 return err;
 }
 
-int hwaccel_decode_init(AVCodecContext *avctx)
+int hwaccel_decode_init(const HWAccel *hwaccel, AVCodecContext *avctx)
 {
 InputStream *ist = avctx->opaque;
 
 ist->hwaccel_retrieve_data = &hwaccel_retrieve_data;
+if (hwaccel->is_dummy && ist->hwaccel_output_format == AV_PIX_FMT_NONE) {
+ist->hwaccel_output_format = hwaccel->pix_fmt;
+}
 
 return 0;
 }
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index 6dc4ad43d2..ec89f50cf9 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -68,31 +68,38 @@
 const HWAccel hwaccels[] = {
 #if HAVE_VDPAU_X11
 { "vdpau", hwaccel_decode_init, HWACCEL_VDPAU, AV_PIX_FMT_VDPAU,
-  AV_HWDEVICE_TYPE_VDPAU },
+  AV_HWDEVICE_TYPE_VDPAU,
+  .is_dummy = 0 },
 #endif
 #if HAVE_DXVA2_LIB
 { "dxva2", dxva2_init, HWACCEL_DXVA2, AV_PIX_FMT_DXVA2_VLD,
-  AV_HWDEVICE_TYPE_NONE },
+  AV_HWDEVICE_TYPE_NONE,
+  .is_dummy = 0 },
 #endif
 #if CONFIG_VDA
 { "vda",   videotoolbox_init,   HWACCEL_VDA,   AV_PIX_FMT_VDA,
-  AV_HWDEVICE_TYPE_NONE },
+  AV_HWDEVICE_TYPE_NONE,
+  .is_dummy = 0 },
 #endif
 #if CONFIG_VIDEOTOOLBOX
 { "videotoolbox",   videotoolbox_init,   HWACCEL_VIDEOTOOLBOX,   
AV_PIX_FMT_VIDEOTOOLBOX,
-  AV_HWDEVICE_TYPE_NONE },
+  AV_HWDEVICE_TYPE_NONE,
+  .is_dummy = 0 },
 #endif
 #if CONFIG_LIBMFX
 { "qsv",   qsv_init,   HWACCEL_QSV,   AV_PIX_FMT_QSV,
-  AV_HWDEVICE_TYPE_NONE },
+  AV_HWDEVICE_TYPE_NONE,
+  .is_dummy = 0 },
 #endif
 #if CONFIG_VAAPI
 { "vaapi", hwaccel_decode_init, HWACCEL_VAAPI, AV_PIX_FMT_VAAPI,
-  AV_HWDEVICE_TYPE_VAAPI },
+  AV_HWDEVICE_TYPE_VAAPI,
+  .is_dummy = 0 },
 #endif
 #if CONFIG_CUVID
 { "cuvid", hwaccel_decode_init, HW

[FFmpeg-devel] [PATCH 0/2] Generic hwaccel for cuvid v3

2017-06-25 Thread Philip Langdale
Third time's a charm.

Based on feedback from Mark, I've reworked my v1 change to explicitly reflect
that we are differentiating behaviour based on whether the hwaccel is real
or a dummy. In the dummy case, we want to set the default output format to
maintain the same semantics as we had before generic hwaccel.

Philip Langdale (2):
  ffmpeg: Switch cuvid to generic hwaccel
  ffmpeg: Set default output format for dummy hwaccels

 Makefile  |  1 -
 ffmpeg.c  |  2 +-
 ffmpeg.h  | 16 +++
 ffmpeg_cuvid.c| 73 ---
 ffmpeg_dxva2.c|  2 +-
 ffmpeg_hw.c   |  5 +++-
 ffmpeg_opt.c  | 23 ++--
 ffmpeg_qsv.c  |  2 +-
 ffmpeg_videotoolbox.c |  2 +-
 9 files changed, 34 insertions(+), 92 deletions(-)
 delete mode 100644 ffmpeg_cuvid.c

-- 
2.11.0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/2] ffmpeg: Switch cuvid to generic hwaccel

2017-06-25 Thread Philip Langdale
With generic hwaccel, it is additionally necessary to specify the
output format. If this is not done, we'll end up downloading the
frames back and then re-uploading them.

For example:

ffmpeg -y -hwaccel cuvid -hwaccel_output_format cuda \
   -c:v h264_cuvid -i sample.mp4 \
   -c:v h264_nvenc -f rawvideo /dev/null

Signed-off-by: Philip Langdale 
---
 Makefile   |  1 -
 ffmpeg.h   |  1 -
 ffmpeg_cuvid.c | 73 --
 ffmpeg_opt.c   |  4 ++--
 4 files changed, 2 insertions(+), 77 deletions(-)
 delete mode 100644 ffmpeg_cuvid.c

diff --git a/Makefile b/Makefile
index aef18185d4..0766c3b719 100644
--- a/Makefile
+++ b/Makefile
@@ -37,7 +37,6 @@ OBJS-ffmpeg-$(CONFIG_LIBMFX)  += ffmpeg_qsv.o
 ifndef CONFIG_VIDEOTOOLBOX
 OBJS-ffmpeg-$(CONFIG_VDA) += ffmpeg_videotoolbox.o
 endif
-OBJS-ffmpeg-$(CONFIG_CUVID)   += ffmpeg_cuvid.o
 OBJS-ffmpeg-$(HAVE_DXVA2_LIB) += ffmpeg_dxva2.o
 OBJS-ffserver += ffserver_config.o
 
diff --git a/ffmpeg.h b/ffmpeg.h
index c3854bcb4a..fa81427471 100644
--- a/ffmpeg.h
+++ b/ffmpeg.h
@@ -665,7 +665,6 @@ int dxva2_init(AVCodecContext *s);
 int vda_init(AVCodecContext *s);
 int videotoolbox_init(AVCodecContext *s);
 int qsv_init(AVCodecContext *s);
-int cuvid_init(AVCodecContext *s);
 
 HWDevice *hw_device_get_by_name(const char *name);
 int hw_device_init_from_string(const char *arg, HWDevice **dev);
diff --git a/ffmpeg_cuvid.c b/ffmpeg_cuvid.c
deleted file mode 100644
index 3ff3b40f17..00
--- a/ffmpeg_cuvid.c
+++ /dev/null
@@ -1,73 +0,0 @@
-/*
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#include "libavutil/hwcontext.h"
-#include "libavutil/pixdesc.h"
-
-#include "ffmpeg.h"
-
-static void cuvid_uninit(AVCodecContext *avctx)
-{
-InputStream *ist = avctx->opaque;
-av_buffer_unref(&ist->hw_frames_ctx);
-}
-
-int cuvid_init(AVCodecContext *avctx)
-{
-InputStream *ist = avctx->opaque;
-AVHWFramesContext *frames_ctx;
-int ret;
-
-av_log(avctx, AV_LOG_VERBOSE, "Initializing cuvid hwaccel\n");
-
-if (!hw_device_ctx) {
-ret = av_hwdevice_ctx_create(&hw_device_ctx, AV_HWDEVICE_TYPE_CUDA,
- ist->hwaccel_device, NULL, 0);
-if (ret < 0) {
-av_log(avctx, AV_LOG_ERROR, "Error creating a CUDA device\n");
-return ret;
-}
-}
-
-av_buffer_unref(&ist->hw_frames_ctx);
-ist->hw_frames_ctx = av_hwframe_ctx_alloc(hw_device_ctx);
-if (!ist->hw_frames_ctx) {
-av_log(avctx, AV_LOG_ERROR, "Error creating a CUDA frames context\n");
-return AVERROR(ENOMEM);
-}
-
-frames_ctx = (AVHWFramesContext*)ist->hw_frames_ctx->data;
-
-frames_ctx->format = AV_PIX_FMT_CUDA;
-frames_ctx->sw_format = avctx->sw_pix_fmt;
-frames_ctx->width = avctx->width;
-frames_ctx->height = avctx->height;
-
-av_log(avctx, AV_LOG_DEBUG, "Initializing CUDA frames context: sw_format = 
%s, width = %d, height = %d\n",
-   av_get_pix_fmt_name(frames_ctx->sw_format), frames_ctx->width, 
frames_ctx->height);
-
-ret = av_hwframe_ctx_init(ist->hw_frames_ctx);
-if (ret < 0) {
-av_log(avctx, AV_LOG_ERROR, "Error initializing a CUDA frame pool\n");
-return ret;
-}
-
-ist->hwaccel_uninit = cuvid_uninit;
-
-return 0;
-}
diff --git a/ffmpeg_opt.c b/ffmpeg_opt.c
index bb6001f534..6dc4ad43d2 100644
--- a/ffmpeg_opt.c
+++ b/ffmpeg_opt.c
@@ -91,8 +91,8 @@ const HWAccel hwaccels[] = {
   AV_HWDEVICE_TYPE_VAAPI },
 #endif
 #if CONFIG_CUVID
-{ "cuvid", cuvid_init, HWACCEL_CUVID, AV_PIX_FMT_CUDA,
-  AV_HWDEVICE_TYPE_NONE },
+{ "cuvid", hwaccel_decode_init, HWACCEL_CUVID, AV_PIX_FMT_CUDA,
+  AV_HWDEVICE_TYPE_CUDA },
 #endif
 { 0 },
 };
-- 
2.11.0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Michael Niedermayer
On Sat, Jun 24, 2017 at 11:39:03PM +0300, Ivan Kalvachev wrote:

[...]
> diff --git a/libavcodec/x86/opus_pvq_search.asm 
> b/libavcodec/x86/opus_pvq_search.asm
> new file mode 100644
> index 00..36b679b75e
> --- /dev/null
> +++ b/libavcodec/x86/opus_pvq_search.asm
> @@ -0,0 +1,628 @@
> +; Opus encoder assembly optimizations
> +; Copyright (C) 2017 Ivan Kalvachev 

Missing (L)GPL header
this breaks fate-source

build and fate otherwise succeeds on linux x86-64 and 32, mingw32 and 64

if people with varous cpus could benchmark this as ivan asked for
that would be good too

thx

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The misfortune of the wise is better than the prosperity of the fool.
-- Epicurus


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Henrik Gramner
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev  wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
[...]
> +haddps  %1,   %1
> +haddps  %1,   %1
[...]
> +   phaddd   xmm%1,xmm%1
> +   phaddd   xmm%1,xmm%1

You can safely assume that those instructions are always slow and that
this is virtually never the correct way to use them, so just use the
shuffle + add method.

You can unconditionally use non-destructive 3-arg instructions
(without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc
abstraction layer will automatically insert register-register moves as
needed.

I'm a bit doubtful if it's worth the complexity to emulate 256-bit
integer math using floating-point instruction hacks, especially since
that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's
probably fine to simply require AVX2 if you need 256-bit integer SIMD.

Be aware that most SSE SIMD instructions are actually implemented as
x86inc macros and redefining them can have unexpected consequences and
is therefore discouraged.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] x86inc: don't use read-only data sections on COFF targets

2017-06-25 Thread James Almer
Yasm:
src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support 
read-only data sections
src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align'

Nasm:
src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support 
section alignment specification
src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here

Signed-off-by: James Almer 
---
Untested.

 libavutil/x86/x86inc.asm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index fa826e6d85..c4ec29bd9d 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -88,6 +88,8 @@
 %macro SECTION_RODATA 0-1 16
 %ifidn __OUTPUT_FORMAT__,aout
 section .text
+%elifidn __OUTPUT_FORMAT__,coff
+section .text
 %else
 SECTION .rodata align=%1
 %endif
-- 
2.13.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel