Re: [FFmpeg-devel] [PATCH v2 01/11] avcodec/vvc: add shared header for vvc

2021-01-10 Thread Nuo Mi
How about we define it as 20,  check the size and return error if > 20.
20 should enough for most of clips. hevc used 20.

On Sun, Jan 10, 2021 at 9:39 AM Nuo Mi  wrote:

>
>
> On Sun, Jan 10, 2021 at 3:09 AM Mark Thompson  wrote:
>
>> On 09/01/2021 07:34, Nuo Mi wrote:
>> > ---
>> >   libavcodec/vvc.h | 124 +++
>> >   1 file changed, 124 insertions(+)
>> >   create mode 100644 libavcodec/vvc.h
>> >
>> > diff --git a/libavcodec/vvc.h b/libavcodec/vvc.h
>> > new file mode 100644
>> > index 00..0bd2acac1d
>> > --- /dev/null
>> > +++ b/libavcodec/vvc.h
>> > @@ -0,0 +1,124 @@
>> > ...
>> > +
>> > +enum {
>> > +VVC_MAX_PLANES = 3,
>>
>> MAX_SAMPLE_ARRAYS, with reference to 6.2?  The term "plane" is never used
>> in the specification at all.
>>
>> > +//7.4.3.3 The value of vps_max_sublayers_minus1 shall be in the
>> range of 0 to 6, inclusive
>> > +VVC_MAX_SUBLAYERS = 7,
>> > +
>> > +// 7.3.2.3: vps_video_parameter_set_id is u(4).
>> > +VVC_MAX_VPS_COUNT = 16,
>> > +// 7.3.2.4: sps_seq_parameter_set_id is u(4)
>> > +VVC_MAX_SPS_COUNT = 16,
>> > +// 7.3.2.5: pps_pic_parameter_set_id is u(6)
>> > +VVC_MAX_PPS_COUNT = 64,
>> > +
>> > +// 7.4.4.1: ptl_num_sub_profiles is u(8)
>> > +VVC_MAX_SUB_PROFILES = 256,
>> > +
>> > +// A.4.2: according to (1577), MaxDpbSize is bounded above by 2 *
>> maxDpbPicBuf(8)
>> > +VVC_MAX_DPB_SIZE = 16,
>> > +
>> > +//7.4.3.4 sps_num_ref_pic_lists in range [0, 64]
>> > +VVC_MAX_REF_PIC_LISTS = 64,
>> > +
>> > +//7.4.3.3 sps_num_points_in_qp_table_minus1[i] in range [0, 36 −
>> sps_qp_table_start_minus26[i]],
>> > +//sps_qp_table_start_minus26[i] in range
>> [sps_qp_table_start_minus26[i] −26 − QpBdOffset, 36]
>> > +//for 10 bitsQpBdOffset is 12, so
>> sps_num_points_in_qp_table_minus1[i] in range [0, 74]
>> > +VVC_MAX_POINTS_IN_QP_TABLE = 75,
>> > +
>> > +// 7.4.6.1: hrd_cpb_cnt_minus1 is in [0, 31].
>> > +VVC_MAX_CPB_CNT = 32,
>> > +
>> > +// A.4.1: the highest level allows a MaxLumaPs of 35 651 584.
>> > +VVC_MAX_LUMA_PS = 35651584,
>> > +// A.4.1: pic_width_in_luma_samples and pic_height_in_luma_samples
>> are
>> > +// constrained to be not greater than sqrt(MaxLumaPs * 8).  Hence
>> height/
>> > +// width are bounded above by sqrt(8 * 35651584) = 16888.2 samples.
>> > +VVC_MAX_WIDTH  = 16888,
>> > +VVC_MAX_HEIGHT = 16888,
>> > +
>> > +// A.4.1: table A.1 allows at most 440 tiles for any au.
>> > +VVC_MAX_TILE_ROWS= 440,
>>
>> Is this bound really the best we can do?
>>
>> That is, is it actually possible to construct a valid stream with 440
>> tile rows?  It must have a single tile column and a height of at least
>> 14080 (for 440 rows of 32x32 CTUs), which feels extreme enough that it
>> might hit some of the other level constraints.
>>
> The  VVC_MAX_HEIGHT is 16888, it's higher than 14080.
> If we limit the VVC_MAX_HEIGHT to 4k, we can reduce it to 135.
>
>>
>> > +// A.4.1: table A.1 allows at most 20 tile columns for any level.
>> > +VVC_MAX_TILE_COLUMNS = 20,
>> > +
>> > +// A.4.1 table A.1 allows at most 600 slice for any level.
>> > +VVC_MAX_SLICES = 600,
>> > +
>> > +// 7.4.8: in the worst case (tiles_enabled_flag and
>> > +// entropy_coding_sync_enabled_flag are both set), entry points
>> can be
>> > +// placed at the beginning of every Ctb row in every tile, giving
>> an
>> > +// upper bound of (num_tile_columns_minus1 + 1) * PicHeightInCtbsY
>> - 1.
>> > +// Only a stream with very high resolution and perverse parameters
>> could
>> > +// get near that, though, so set a lower limit here with the
>> maximum
>> > +// possible value for 8K video (at most 135 32x32 Ctb rows).
>> > +VVC_MAX_ENTRY_POINTS = VVC_MAX_TILE_COLUMNS * 135,
>> > +};
>> > +
>> > +#endif /* AVCODEC_VVC_H */
>>
>> - Mark
>> ___
>> ffmpeg-devel mailing list
>> ffmpeg-devel@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2] avformat/utils: prevent ts out of [min_ts, max_ts] interval due to rouding

2021-01-10 Thread Zhao Zhili
Rounding min_ts towards +infinity and max_ts towards -infinity can
make ts out of the [min_ts, max_ts] interval, and then leads to seek
failure. Fix it by using the simple rounding as ts for both min_ts
and max_ts.
---
 libavformat/utils.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libavformat/utils.c b/libavformat/utils.c
index 503e583ad0..88221c5ac4 100644
--- a/libavformat/utils.c
+++ b/libavformat/utils.c
@@ -2500,10 +2500,10 @@ int avformat_seek_file(AVFormatContext *s, int 
stream_index, int64_t min_ts,
 ts = av_rescale_q(ts, AV_TIME_BASE_Q, time_base);
 min_ts = av_rescale_rnd(min_ts, time_base.den,
 time_base.num * (int64_t)AV_TIME_BASE,
-AV_ROUND_UP   | AV_ROUND_PASS_MINMAX);
+AV_ROUND_PASS_MINMAX);
 max_ts = av_rescale_rnd(max_ts, time_base.den,
 time_base.num * (int64_t)AV_TIME_BASE,
-AV_ROUND_DOWN | AV_ROUND_PASS_MINMAX);
+AV_ROUND_PASS_MINMAX);
 stream_index = 0;
 }
 
-- 
2.28.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 06/11] avcodec: add cbs for h266/vvc

2021-01-10 Thread Nuo Mi
On Sun, Jan 10, 2021 at 5:34 AM Mark Thompson  wrote:

> On 09/01/2021 07:34, Nuo Mi wrote:
> > ---
> >   configure |2 +
> >   libavcodec/Makefile   |1 +
> >   libavcodec/cbs.c  |6 +
> >   libavcodec/cbs_h2645.c|  373 
> >   libavcodec/cbs_h266.h |  840 
> >   libavcodec/cbs_h266_syntax_template.c | 2761 +
> >   libavcodec/cbs_internal.h |3 +-
> >   7 files changed, 3985 insertions(+), 1 deletion(-)
> >   create mode 100644 libavcodec/cbs_h266.h
> >   create mode 100644 libavcodec/cbs_h266_syntax_template.c
> >
> > ...
> > @@ -920,6 +934,135 @@ static int
> cbs_h265_read_nal_unit(CodedBitstreamContext *ctx,
> >   return 0;
> >   }
> >
> > +static int cbs_h266_replace_ph(CodedBitstreamContext *ctx,
> > +   CodedBitstreamUnit *unit)
> > +{
> > +CodedBitstreamH266Context *priv = ctx->priv_data;
> > +int err;
> > +err = ff_cbs_make_unit_refcounted(ctx, unit);
> > +if (err < 0)
> > +return err;
> > +av_buffer_unref(&priv->ph_ref);
> > +av_assert0(unit->content_ref);
> > +priv->ph_ref = av_buffer_ref(unit->content_ref);
> > +if (!priv->ph_ref)
> > +return AVERROR(ENOMEM);
> > +priv->active_ph = priv->ph = (H266RawPH *)priv->ph_ref->data;
>
> Why are there too variables here?  They seem to always be the same.
>
  priv->active_ph is read-only, priv->ph is writeable pointer. I can change
to priv->ph only if you prefer.

>
> > +return 0;
> > +}
> > +
> > ...
> > +
> >   static int cbs_h2645_assemble_fragment(CodedBitstreamContext *ctx,
> >  CodedBitstreamFragment *frag)
> >   {
> > @@ -1248,6 +1494,11 @@ static int
> cbs_h2645_assemble_fragment(CodedBitstreamContext *ctx,
> >(unit->type == HEVC_NAL_VPS ||
> > unit->type == HEVC_NAL_SPS ||
> > unit->type == HEVC_NAL_PPS)) ||
> > +(ctx->codec->codec_id == AV_CODEC_ID_VVC &&
> > + (unit->type == VVC_VPS_NUT ||
> > +  unit->type == VVC_SPS_NUT ||
> > +  unit->type == VVC_PPS_NUT ||
> > +  unit->type == VVC_PREFIX_APS_NUT)) ||
>
> Also various other things, which might be here since passthrough does not
> require decomposition to be implemented.
>
> This test is getting unwieldy - maybe it should be moved to a new function
> cbs_h2645_unit_requires_zero_byte().
>
done

> >   i == 0 /* (Assume this is the start of an access unit.)
> */) {
> >   // zero_byte
> >   data[dp++] = 0;
> > @@ -1362,6 +1613,41 @@ static void cbs_h265_close(CodedBitstreamContext
> *ctx)
> >   av_buffer_unref(&h265->pps_ref[i]);
> >   }
> >
> > ...
> > @@ -1506,6 +1792,77 @@ static const CodedBitstreamUnitTypeDescriptor
> cbs_h265_unit_types[] = {
> >   CBS_UNIT_TYPE_END_OF_LIST
> >   };
> >
> > +static void cbs_h266_free_sei(void *opaque, uint8_t *content)
> > +{
> > +}
>
> So as implemented currently it is POD?
>
Yes, currently only md5 sei implemented. We can do more after your react
patch merged.

>
> > +
> > +static const CodedBitstreamUnitTypeDescriptor cbs_h266_unit_types[] = {
> > ...
> > +
> > +typedef struct H266RawNALUnitHeader {
> > +uint8_t nuh_layer_id;
> > +uint8_t nal_unit_type;
> > +uint8_t nuh_temporal_id_plus1;
> > +} H266RawNALUnitHeader;
> > +
> > +typedef struct H266GeneralConstraintsInfo {
> > +uint8_t gci_present_flag;
> > +
> > ...
> > +
> > +/* loop filter */
> > +uint8_t gci_no_sao_constraint_flag;
> > +uint8_t gci_no_alf_constraint_flag;
> > +uint8_t gci_no_ccalf_constraint_flag;
> > +uint8_t gci_no_lmcs_constraint_flag;
> > +uint8_t gci_no_ladf_constraint_flag;
> > +uint8_t gci_no_virtual_boundaries_constraint_flag;
> > +uint8_t gci_num_reserved_bits;
>
> Also needs gci_reserved_zero_bit[], so that we can handle streams with
> future constraints rather than just rejecting them.
>
> "Although the value of gci_num_reserved_bits is required to be equal to 0
> in this version
> of this Specification, decoders conforming to this version of this
> Specification shall allow the value of
> gci_num_reserved_bits greater than 0 to appear in the syntax and shall
> ignore the values of all the gci_reserved_zero_bit[ i ]
> syntax elements when gci_num_reserved_bits is greater than 0."
>

This just follows the same pattern as h265.
How about we create a separate patch set for this, fix h265 as well.


> > +} H266GeneralConstraintsInfo;
> > +
> > ...
> > +
> > +typedef struct H266RawVPS {
> > +H266RawNALUnitHeader nal_unit_header;
> > +
> > +uint8_t vps_video_parameter_set_id;
> > +
> > +uint8_t vps_max_layers_minus1;
> > +uint8_t vps_max_sublayers_minus1;
> > +/*TODO add more*/
> > +H266RawExtensionData extension_data;
> > +} H266RawVPS;
>
> You don't actually use the VPS struc

[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger 

Speedup is fairly small, around 1.5%, but these are fairly simple.
---
 libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++
 libavcodec/aarch64/hevcdsp_init_aarch64.c |  24 +++
 2 files changed, 214 insertions(+)

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
index 9f67e45..edd03a0 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -36,6 +36,196 @@ const trans, align=4
 .short 31, 22, 13, 4
 endconst
 
+.macro clip10 in1, in2, c1, c2
+smax\in1, \in1, \c1
+smax\in2, \in2, \c1
+smin\in1, \in1, \c2
+smin\in2, \in2, \c2
+.endm
+
+function ff_hevc_add_residual_4x4_8_neon, export=1
+ld1 {v0.8H-v1.8H}, [x1]
+ld1 {v2.S}[0], [x0], x2
+ld1 {v2.S}[1], [x0], x2
+ld1 {v2.S}[2], [x0], x2
+ld1 {v2.S}[3], [x0], x2
+sub x0, x0, x2, lsl #2
+uxtlv8.8H, v2.8B
+uxtl2   v9.8H, v2.16B
+sqadd   v0.8H, v0.8H, v8.8H
+sqadd   v1.8H, v1.8H, v9.8H
+sqxtun  v0.8B, v0.8H
+sqxtun2 v0.16B, v1.8H
+st1 {v0.S}[0], [x0], x2
+st1 {v0.S}[1], [x0], x2
+st1 {v0.S}[2], [x0], x2
+st1 {v0.S}[3], [x0], x2
+ret
+endfunc
+
+function ff_hevc_add_residual_4x4_10_neon, export=1
+mov x12, x0
+ld1 {v0.8H-v1.8H}, [x1]
+ld1 {v2.D}[0], [x12], x2
+ld1 {v2.D}[1], [x12], x2
+ld1 {v3.D}[0], [x12], x2
+sqadd   v0.8H, v0.8H, v2.8H
+ld1 {V3.D}[1], [x12], x2
+moviv4.8H, #0
+sqadd   v1.8H, v1.8H, v3.8H
+mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF
+clip10  v0.8H, v1.8H, v4.8H, v5.8H
+st1 {v0.D}[0], [x0], x2
+st1 {v0.D}[1], [x0], x2
+st1 {v1.D}[0], [x0], x2
+st1 {v1.D}[1], [x0], x2
+ret
+endfunc
+
+function ff_hevc_add_residual_8x8_8_neon, export=1
+add x12, x0, x2
+add x2,  x2, x2
+mov x3,   #8
+1:  subsx3,   x3, #2
+ld1 {v2.D}[0],   [x0]
+ld1 {v2.D}[1],   [x12]
+uxtlv3.8H,   v2.8B
+ld1 {v0.8H-v1.8H}, [x1], #32
+uxtl2   v2.8H,   v2.16B
+sqadd   v0.8H,   v0.8H,   v3.8H
+sqadd   v1.8H,   v1.8H,   v2.8H
+sqxtun  v0.8B,   v0.8H
+sqxtun2 v0.16B,  v1.8H
+st1 {v0.D}[0],   [x0], x2
+st1 {v0.D}[1],   [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_8x8_10_neon, export=1
+add x12, x0, x2
+add x2,  x2, x2
+mov x3,  #8
+moviv4.8H, #0
+mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF
+1:  subsx3,  x3, #2
+ld1 {v0.8H-v1.8H}, [x1], #32
+ld1 {v2.8H},[x0]
+sqadd   v0.8H, v0.8H, v2.8H
+ld1 {v3.8H},[x12]
+sqadd   v1.8H, v1.8H, v3.8H
+clip10  v0.8H, v1.8H, v4.8H, v5.8H
+st1 {v0.8H}, [x0], x2
+st1 {v1.8H}, [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_16x16_8_neon, export=1
+mov x3,  #16
+add x12, x0, x2
+add x2,  x2, x2
+1:  subsx3,  x3, #2
+ld1 {v16.16B}, [x0]
+ld1 {v0.8H-v3.8H}, [x1], #64
+ld1 {v19.16B},[x12]
+uxtlv17.8H, v16.8B
+uxtl2   v18.8H, v16.16B
+uxtlv20.8H, v19.8B
+uxtl2   v21.8H, v19.16B
+sqadd   v0.8H,  v0.8H, v17.8H
+sqadd   v1.8H,  v1.8H, v18.8H
+sqadd   v2.8H,  v2.8H, v20.8H
+sqadd   v3.8H,  v3.8H, v21.8H
+sqxtun  v0.8B,  v0.8H
+sqxtun2 v0.16B, v1.8H
+sqxtun  v1.8B,  v2.8H
+sqxtun2 v1.16B, v3.8H
+st1 {v0.16B}, [x0], x2
+st1 {v1.16B}, [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_16x16_10_neon, export=1
+mov x3,  #16
+moviv20.8H, #0
+mvniv21.8H, #0xFC, LSL #8 // movi #0x3FF
+add x12, x0, x2
+add x2,  x2, x2
+1:  subs   

[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger 

Speedup is fairly small, around 1.5%, but these are fairly simple.
---
 libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++
 libavcodec/aarch64/hevcdsp_init_aarch64.c |  24 +++
 2 files changed, 214 insertions(+)

diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S 
b/libavcodec/aarch64/hevcdsp_idct_neon.S
index 9f67e45..edd03a0 100644
--- a/libavcodec/aarch64/hevcdsp_idct_neon.S
+++ b/libavcodec/aarch64/hevcdsp_idct_neon.S
@@ -36,6 +36,196 @@ const trans, align=4
 .short 31, 22, 13, 4
 endconst
 
+.macro clip10 in1, in2, c1, c2
+smax\in1, \in1, \c1
+smax\in2, \in2, \c1
+smin\in1, \in1, \c2
+smin\in2, \in2, \c2
+.endm
+
+function ff_hevc_add_residual_4x4_8_neon, export=1
+ld1 {v0.8H-v1.8H}, [x1]
+ld1 {v2.S}[0], [x0], x2
+ld1 {v2.S}[1], [x0], x2
+ld1 {v2.S}[2], [x0], x2
+ld1 {v2.S}[3], [x0], x2
+sub x0, x0, x2, lsl #2
+uxtlv8.8H, v2.8B
+uxtl2   v9.8H, v2.16B
+sqadd   v0.8H, v0.8H, v8.8H
+sqadd   v1.8H, v1.8H, v9.8H
+sqxtun  v0.8B, v0.8H
+sqxtun2 v0.16B, v1.8H
+st1 {v0.S}[0], [x0], x2
+st1 {v0.S}[1], [x0], x2
+st1 {v0.S}[2], [x0], x2
+st1 {v0.S}[3], [x0], x2
+ret
+endfunc
+
+function ff_hevc_add_residual_4x4_10_neon, export=1
+mov x12, x0
+ld1 {v0.8H-v1.8H}, [x1]
+ld1 {v2.D}[0], [x12], x2
+ld1 {v2.D}[1], [x12], x2
+ld1 {v3.D}[0], [x12], x2
+sqadd   v0.8H, v0.8H, v2.8H
+ld1 {V3.D}[1], [x12], x2
+moviv4.8H, #0
+sqadd   v1.8H, v1.8H, v3.8H
+mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF
+clip10  v0.8H, v1.8H, v4.8H, v5.8H
+st1 {v0.D}[0], [x0], x2
+st1 {v0.D}[1], [x0], x2
+st1 {v1.D}[0], [x0], x2
+st1 {v1.D}[1], [x0], x2
+ret
+endfunc
+
+function ff_hevc_add_residual_8x8_8_neon, export=1
+add x12, x0, x2
+add x2,  x2, x2
+mov x3,   #8
+1:  subsx3,   x3, #2
+ld1 {v2.D}[0],   [x0]
+ld1 {v2.D}[1],   [x12]
+uxtlv3.8H,   v2.8B
+ld1 {v0.8H-v1.8H}, [x1], #32
+uxtl2   v2.8H,   v2.16B
+sqadd   v0.8H,   v0.8H,   v3.8H
+sqadd   v1.8H,   v1.8H,   v2.8H
+sqxtun  v0.8B,   v0.8H
+sqxtun2 v0.16B,  v1.8H
+st1 {v0.D}[0],   [x0], x2
+st1 {v0.D}[1],   [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_8x8_10_neon, export=1
+add x12, x0, x2
+add x2,  x2, x2
+mov x3,  #8
+moviv4.8H, #0
+mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF
+1:  subsx3,  x3, #2
+ld1 {v0.8H-v1.8H}, [x1], #32
+ld1 {v2.8H},[x0]
+sqadd   v0.8H, v0.8H, v2.8H
+ld1 {v3.8H},[x12]
+sqadd   v1.8H, v1.8H, v3.8H
+clip10  v0.8H, v1.8H, v4.8H, v5.8H
+st1 {v0.8H}, [x0], x2
+st1 {v1.8H}, [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_16x16_8_neon, export=1
+mov x3,  #16
+add x12, x0, x2
+add x2,  x2, x2
+1:  subsx3,  x3, #2
+ld1 {v16.16B}, [x0]
+ld1 {v0.8H-v3.8H}, [x1], #64
+ld1 {v19.16B},[x12]
+uxtlv17.8H, v16.8B
+uxtl2   v18.8H, v16.16B
+uxtlv20.8H, v19.8B
+uxtl2   v21.8H, v19.16B
+sqadd   v0.8H,  v0.8H, v17.8H
+sqadd   v1.8H,  v1.8H, v18.8H
+sqadd   v2.8H,  v2.8H, v20.8H
+sqadd   v3.8H,  v3.8H, v21.8H
+sqxtun  v0.8B,  v0.8H
+sqxtun2 v0.16B, v1.8H
+sqxtun  v1.8B,  v2.8H
+sqxtun2 v1.16B, v3.8H
+st1 {v0.16B}, [x0], x2
+st1 {v1.16B}, [x12], x2
+bne 1b
+ret
+endfunc
+
+function ff_hevc_add_residual_16x16_10_neon, export=1
+mov x3,  #16
+moviv20.8H, #0
+mvniv21.8H, #0xFC, LSL #8 // movi #0x3FF
+add x12, x0, x2
+add x2,  x2, x2
+1:  subs   

Re: [FFmpeg-devel] [PATCH] libavfilter/dnn: add batch mode for async execution

2021-01-10 Thread Guo, Yejun


> -Original Message-
> From: Guo, Yejun 
> Sent: 2021年1月8日 16:37
> To: ffmpeg-devel@ffmpeg.org
> Cc: Guo, Yejun 
> Subject: [PATCH] libavfilter/dnn: add batch mode for async execution
> 
> the default number of batch_size is 1
> 
> Signed-off-by: Xie, Lin 
> Signed-off-by: Wu Zhiwen 
> Signed-off-by: Guo, Yejun 
> ---
>  libavfilter/dnn/dnn_backend_openvino.c | 157 +
>  libavfilter/dnn/dnn_backend_openvino.h |   1 +
>  libavfilter/dnn/dnn_interface.c|   1 +
>  libavfilter/dnn_interface.h|   2 +
>  libavfilter/vf_dnn_processing.c|  36 +-
>  5 files changed, 173 insertions(+), 24 deletions(-)
> 
> diff --git a/libavfilter/dnn/dnn_backend_openvino.c
> b/libavfilter/dnn/dnn_backend_openvino.c
> index d27e451eea..cb1bc3d22d 100644
> --- a/libavfilter/dnn/dnn_backend_openvino.c
> +++ b/libavfilter/dnn/dnn_backend_openvino.c

please ignore this patch, it has some issue, will send out V2 later, thanks.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH V2] libavfilter/dnn: add batch mode for async execution

2021-01-10 Thread Guo, Yejun
the default number of batch_size is 1

Signed-off-by: Xie, Lin 
Signed-off-by: Wu Zhiwen 
Signed-off-by: Guo, Yejun 
---
 libavfilter/dnn/dnn_backend_openvino.c | 187 -
 libavfilter/dnn/dnn_backend_openvino.h |   1 +
 libavfilter/dnn/dnn_interface.c|   1 +
 libavfilter/dnn_interface.h|   2 +
 libavfilter/vf_dnn_processing.c|  36 -
 5 files changed, 194 insertions(+), 33 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c 
b/libavfilter/dnn/dnn_backend_openvino.c
index d27e451eea..5271d1caa5 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -37,6 +37,7 @@
 typedef struct OVOptions{
 char *device_type;
 int nireq;
+int batch_size;
 } OVOptions;
 
 typedef struct OVContext {
@@ -70,7 +71,8 @@ typedef struct TaskItem {
 
 typedef struct RequestItem {
 ie_infer_request_t *infer_request;
-TaskItem *task;
+TaskItem **tasks;
+int task_count;
 ie_complete_call_back_t callback;
 } RequestItem;
 
@@ -83,6 +85,7 @@ typedef struct RequestItem {
 static const AVOption dnn_openvino_options[] = {
 { "device", "device to run model", OFFSET(options.device_type), 
AV_OPT_TYPE_STRING, { .str = "CPU" }, 0, 0, FLAGS },
 { "nireq",  "number of request",   OFFSET(options.nireq),   
AV_OPT_TYPE_INT,{ .i64 = 0 }, 0, INT_MAX, FLAGS },
+{ "batch_size",  "batch size per request", OFFSET(options.batch_size),  
AV_OPT_TYPE_INT,{ .i64 = 1 }, 1, 1000, FLAGS},
 { NULL }
 };
 
@@ -100,7 +103,19 @@ static DNNDataType precision_to_datatype(precision_e 
precision)
 }
 }
 
-static DNNReturnType fill_model_input_ov(OVModel *ov_model, TaskItem *task, 
RequestItem *request)
+static int get_datatype_size(DNNDataType dt)
+{
+switch (dt)
+{
+case DNN_FLOAT:
+return sizeof(float);
+default:
+av_assert0(!"not supported yet.");
+return 1;
+}
+}
+
+static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem 
*request)
 {
 dimensions_t dims;
 precision_e precision;
@@ -109,6 +124,7 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, 
TaskItem *task, Requ
 IEStatusCode status;
 DNNData input;
 ie_blob_t *input_blob = NULL;
+TaskItem *task = request->tasks[0];
 
 status = ie_infer_request_get_blob(request->infer_request, 
task->input_name, &input_blob);
 if (status != OK) {
@@ -134,12 +150,19 @@ static DNNReturnType fill_model_input_ov(OVModel 
*ov_model, TaskItem *task, Requ
 input.channels = dims.dims[1];
 input.data = blob_buffer.buffer;
 input.dt = precision_to_datatype(precision);
-if (task->do_ioproc) {
-if (ov_model->model->pre_proc != NULL) {
-ov_model->model->pre_proc(task->in_frame, &input, 
ov_model->model->filter_ctx);
-} else {
-proc_from_frame_to_dnn(task->in_frame, &input, ctx);
+
+av_assert0(request->task_count <= dims.dims[0]);
+for (int i = 0; i < request->task_count; ++i) {
+task = request->tasks[i];
+if (task->do_ioproc) {
+if (ov_model->model->pre_proc != NULL) {
+ov_model->model->pre_proc(task->in_frame, &input, 
ov_model->model->filter_ctx);
+} else {
+proc_from_frame_to_dnn(task->in_frame, &input, ctx);
+}
 }
+input.data = (uint8_t *)input.data
+ + input.width * input.height * input.channels * 
get_datatype_size(input.dt);
 }
 ie_blob_free(&input_blob);
 
@@ -152,7 +175,7 @@ static void infer_completion_callback(void *args)
 precision_e precision;
 IEStatusCode status;
 RequestItem *request = args;
-TaskItem *task = request->task;
+TaskItem *task = request->tasks[0];
 ie_blob_t *output_blob = NULL;
 ie_blob_buffer_t blob_buffer;
 DNNData output;
@@ -194,41 +217,56 @@ static void infer_completion_callback(void *args)
 output.width= dims.dims[3];
 output.dt   = precision_to_datatype(precision);
 output.data = blob_buffer.buffer;
-if (task->do_ioproc) {
-if (task->ov_model->model->post_proc != NULL) {
-task->ov_model->model->post_proc(task->out_frame, &output, 
task->ov_model->model->filter_ctx);
+
+av_assert0(request->task_count <= dims.dims[0]);
+av_assert0(request->task_count >= 1);
+for (int i = 0; i < request->task_count; ++i) {
+task = request->tasks[i];
+if (task->do_ioproc) {
+if (task->ov_model->model->post_proc != NULL) {
+task->ov_model->model->post_proc(task->out_frame, &output, 
task->ov_model->model->filter_ctx);
+} else {
+proc_from_dnn_to_frame(task->out_frame, &output, ctx);
+}
 } else {
-proc_from_dnn_to_frame(task->out_frame, &output, ctx);
+task->out_frame->width = output.width;
+task->out_frame->height = output.height;
   

Re: [FFmpeg-devel] [PATCH 2/5] avcodec/fft_template: Remove unused fixed-point cosine tables

2021-01-10 Thread Michael Niedermayer
On Sun, Jan 10, 2021 at 01:56:21AM +0100, Andreas Rheinhardt wrote:
> Michael Niedermayer:
> > On Thu, Jan 07, 2021 at 12:13:05AM +0100, Andreas Rheinhardt wrote:
> >> There are three types of FFTs: floating-point, 32-bit fixed-point and
> >> 16-bit fixed-point. The latter has exactly one user: The fixed-point
> >> AC-3-encoder; the cosine tables used by it use up to seven bits. The
> >> tables corresponding to eight to seventeen bits are unused, as are the
> >> FFT functions for these bits.
> >>
> >> Therefore this commit removes these tables and functions. This is
> >> especially beneficial when using hardcoded tables as they take up 
> >> moreFirst,
> >> than 255 KiB. But even without it one saves said unused functions as
> >> well as entries in corresponding tables (this also saves relocations).
> >>
> >> Signed-off-by: Andreas Rheinhardt 
> >> ---
> >> Thee changes to ARM assembly are honstely untested. I hope someone can
> >> test them. Btw: It seems that the ARM assembly code wouldn't be able to
> >> deal with an FFT with more than 16 bits (no function for this has been
> >> defined), which only worked because no one ever used that many bits with
> >> the fixed-point FFT.
> >>
> >>  libavcodec/arm/fft_fixed_neon.S | 18 --
> >>  libavcodec/cos_tablegen.c   |  4 ++--
> >>  libavcodec/fft.h|  4 +++-
> >>  libavcodec/fft_fixed.c  |  1 +
> >>  libavcodec/fft_template.c   | 31 +++
> >>  tests/fate/fft.mak  |  8 ++--
> >>  6 files changed, 35 insertions(+), 31 deletions(-)
> > 
> > make -j32 libavcodec/tests/fft-fixed && libavcodec/tests/fft-fixed
> > Segmentation fault (core dumped)
> > 
> > (if you cant repro say so and ill rebuild with debug symbols ...)
> > 
> > thx
> > [...]
> > 
> 1. Lynne has an alternative patchset that makes the only user of
> fft_fixed use fft_fixed_32 instead, so this is not important any more.

> 2. Are you testing the ARM assembly code (for which I ask for a test) or

x86-64

> not? If not, then this surprises me. Did you apply the changes to
> fft.mak (some of the tests have been removed as they tested
> functionality that was unused (apart from the tests) and has therefore
> been removed).

i applied the changes from this patchset up to and including the patch
and also did a make distclean
FFT 512 test
Checking...
==18069== Jump to the invalid address stated on the next line
==18069==at 0x0: ???
==18069==by 0x10F5F9: main (fft.c:529)
==18069==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18069== 
==18069== 
==18069== Process terminating with default action of signal 11 (SIGSEGV)
==18069==  Bad permissions for mapped region at address 0x0
==18069==at 0x0: ???
==18069==by 0x10F5F9: main (fft.c:529)

commit 6c532480712d395f5973063adcefce62fc75f2e1 (HEAD)
Author: Andreas Rheinhardt 
Date:   Thu Jan 7 00:13:05 2021 +0100

avcodec/fft_template: Remove unused fixed-point cosine tables

There are three types of FFTs: floating-point, 32-bit fixed-point and
16-bit fixed-point. The latter has exactly one user: The fixed-point
AC-3-encoder; the cosine tables used by it use up to seven bits. The
tables corresponding to eight to seventeen bits are unused, as are the
FFT functions for these bits.

Therefore this commit removes these tables and functions. This is
especially beneficial when using hardcoded tables as they take up more
than 255 KiB. But even without it one saves said unused functions as
well as entries in corresponding tables (this also saves relocations).

Signed-off-by: Andreas Rheinhardt 
Signed-off-by: Michael Niedermayer 

 libavcodec/arm/fft_fixed_neon.S | 18 --
 libavcodec/cos_tablegen.c   |  4 ++--
 libavcodec/fft.h|  4 +++-
 libavcodec/fft_fixed.c  |  1 +
 libavcodec/fft_template.c   | 31 +++
 tests/fate/fft.mak  |  8 ++--
 6 files changed, 35 insertions(+), 31 deletions(-)

commit c592684681700a7d8b41e75a11104f8c1bdd13d9
Author: Andreas Rheinhardt 
Date:   Thu Jan 7 00:13:04 2021 +0100

avcodec/tableprint: Don't include mem_internal.h

tableprint.h does not declare anything as aligned; it just prints
DECLARE_ALIGNED. So it can be removed; in fact, it needs to be removed,
because mem_internal.h includes config.h which leads to warnings when
building with hardcoded tables enabled because of redefinitions of
CONFIG_HARDCODED_TABLES.

(Furthermore, config.h is only valid for the target, not the host,
so HAVE_LOCAL_ALIGNED might even be wrong here.)

Signed-off-by: Andreas Rheinhardt 
Signed-off-by: Michael Niedermayer 

 libavcodec/tableprint.h | 1 -
 1 file changed, 1 deletion(-)

commit 91e1625db15fe8853ceedca9eed14307aaa514c7 (origin/master, origin/HEAD, 
refs/bisect/good-91e1625db15fe8853ceedca9eed14307aaa514c7)

[...]

-- 
Michael GnuPG fin

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-10 Thread Michael Niedermayer
On Thu, Jan 07, 2021 at 10:39:56AM +0100, Alan Kelly wrote:
> Thanks for your patience with this, I have replaced mova with movdqu - movu
> generated a compile error on ssse3. What system did this crash on?

AMD Ryzen 9 3950X on linux

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Everything should be made as simple as possible, but not simpler.
-- Albert Einstein


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2021-01-10 Thread Michael Niedermayer
On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote:
> ---
>  Replaces mova with movdqu due to alignment issues
>  libswscale/x86/Makefile |   1 +
>  libswscale/x86/swscale.c| 106 +---
>  libswscale/x86/yuv2yuvX.asm | 117 
>  tests/checkasm/sw_scale.c   |  98 ++
>  4 files changed, 246 insertions(+), 76 deletions(-)
>  create mode 100644 libswscale/x86/yuv2yuvX.asm

I have one / some ? cases where this changes output
 ./ffmpeg -i utvideo-yuv422p10le_UQY2_crc32-A431CD5F.avi -bitexact avi.avi
 
 i dont know if theres a decoder bug or bug in the patch or something else
 
-rw-r- 1 michael michael 246218 Jan 10 16:23 avi.avi
-rw-r- 1 michael michael 245824 Jan 10 16:23 avi-ref.avi

file should be at:
https://samples.ffmpeg.org/ffmpeg-bugs/trac/ticket4044/

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

In a rich man's house there is no place to spit but his face.
-- Diogenes of Sinope


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/4] avformat/rtsp: set AV_OPT_FLAG_DEPRECATED on depracated options

2021-01-10 Thread Andriy Gelman
On Thu, 17. Dec 10:42, Andriy Gelman wrote:
> On Tue, 08. Dec 22:35, Andriy Gelman wrote:
> > Hi Zhao, 
> > 
> > Thanks for reviewing.
> > 
> > On Tue, 08. Dec 13:25, "zhilizhao(赵志立)" wrote:
> > > 
> > > 
> > > > On Dec 8, 2020, at 12:08 PM, Andriy Gelman  
> > > > wrote:
> > > > 
> > > > On Sun, 15. Nov 13:20, Andriy Gelman wrote:
> > > >> From: Andriy Gelman 
> > > >> 
> > > >> Signed-off-by: Andriy Gelman 
> > > >> ---
> > > >> libavformat/rtsp.c | 4 ++--
> > > >> 1 file changed, 2 insertions(+), 2 deletions(-)
> > > >> 
> > > >> diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c
> > > >> index d9832bbf1f..2ef75f50e3 100644
> > > >> --- a/libavformat/rtsp.c
> > > >> +++ b/libavformat/rtsp.c
> > > >> @@ -94,7 +94,7 @@ const AVOption ff_rtsp_options[] = {
> > > >> { "max_port", "set maximum local UDP port", OFFSET(rtp_port_max), 
> > > >> AV_OPT_TYPE_INT, {.i64 = RTSP_RTP_PORT_MAX}, 0, 65535, DEC|ENC },
> > > >> { "listen_timeout", "set maximum timeout (in seconds) to wait for 
> > > >> incoming connections (-1 is infinite, imply flag listen)", 
> > > >> OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 = -1}, INT_MIN, 
> > > >> INT_MAX, DEC },
> > > >> #if FF_API_OLD_RTSP_OPTIONS
> > > >> -{ "timeout", "set maximum timeout (in seconds) to wait for 
> > > >> incoming connections (-1 is infinite, imply flag listen) (deprecated, 
> > > >> use listen_timeout)", OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 
> > > >> = -1}, INT_MIN, INT_MAX, DEC },
> > > >> +{ "timeout", "set maximum timeout (in seconds) to wait for 
> > > >> incoming connections (-1 is infinite, imply flag listen) (deprecated, 
> > > >> use listen_timeout)", OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 
> > > >> = -1}, INT_MIN, INT_MAX, DEC|AV_OPT_FLAG_DEPRECATED },
> > > >> { "stimeout", "set timeout (in microseconds) of socket TCP I/O 
> > > >> operations", OFFSET(stimeout), AV_OPT_TYPE_INT, {.i64 = 0}, INT_MIN, 
> > > >> INT_MAX, DEC },
> > > 
> > > Looks good to me, although it’s a little weird that after major bump 
> > > “timeout”
> > > will have a different meaning instead of being dropped. “stimeout” is
> > > deprecated, since there is not other option to replace it at the current 
> > > time,
> > > it cannot be marked as AV_OPT_FLAG_DEPRECATED.
> > > 
> > 
> > Right, after the major bump timeout will become the suggested alternative 
> > for
> > stimeout, and stimeout will have the deprecated label.
> > I think the idea is to get away from timeout option implying the listen 
> > mode.
> 
> Will apply this patch.
> 
> Ping for patches 2-4.
> 

ping

-- 
Andriy
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux

2021-01-10 Thread Andriy Gelman
On Sun, 13. Dec 11:41, Andriy Gelman wrote:
> On Sat, 28. Nov 14:46, Andriy Gelman wrote:
> > From: Andriy Gelman 
> > 
> > As per signal() help (man 2 signal) the semantics of using signal may
> > vary across platforms. It is suggested to use sigaction() instead.
> > 
> > On my system, the capture signal is reset to the default handler after
> > the first call thus failing to properly handle multiple SIGINTs.
> > 
> > Signed-off-by: Andriy Gelman 
> > ---
> >  fftools/ffmpeg.c | 31 +++
> >  1 file changed, 27 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
> > index 80f436eab3..01f4ef15d8 100644
> > --- a/fftools/ffmpeg.c
> > +++ b/fftools/ffmpeg.c
> > @@ -393,8 +393,30 @@ static BOOL WINAPI CtrlHandler(DWORD fdwCtrlType)
> >  }
> >  #endif
> >  
> > +#ifdef __linux__
> > +#define SIGNAL(sig, func)   \
> > +do {\
> > +action.sa_handler = func;   \
> > +sigaction(sig, &action, NULL);  \
> > +} while (0)
> > +#else
> > +#define SIGNAL(sig, func) \
> > +signal(sig, func)
> > +#endif
> > +
> >  void term_init(void)
> >  {
> > +#if defined __linux__
> > +struct sigaction action;
> > +action.sa_handler = sigterm_handler;
> > +
> > +/* block other interrupts while processing this one */
> > +sigfillset(&action.sa_mask);
> > +
> > +/* restart interruptible functions (i.e. don't fail with EINTR)  */
> > +action.sa_flags = SA_RESTART;
> > +#endif
> > +
> >  #if HAVE_TERMIOS_H
> >  if (!run_as_daemon && stdin_interaction) {
> >  struct termios tty;
> > @@ -413,14 +435,15 @@ void term_init(void)
> >  
> >  tcsetattr (0, TCSANOW, &tty);
> >  }
> > -signal(SIGQUIT, sigterm_handler); /* Quit (POSIX).  */
> > +SIGNAL(SIGQUIT, sigterm_handler); /* Quit (POSIX).  */
> >  }
> >  #endif
> >  
> > -signal(SIGINT , sigterm_handler); /* Interrupt (ANSI).*/
> > -signal(SIGTERM, sigterm_handler); /* Termination (ANSI).  */
> > +SIGNAL(SIGINT, sigterm_handler);
> > +SIGNAL(SIGTERM, sigterm_handler);
> > +
> >  #ifdef SIGXCPU
> > -signal(SIGXCPU, sigterm_handler);
> > +SIGNAL(SIGXCPU, sigterm_handler);
> >  #endif
> >  #ifdef SIGPIPE
> >  signal(SIGPIPE, SIG_IGN); /* Broken pipe (POSIX). */
> 
> ping
> 

ping

-- 
Andriy
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux

2021-01-10 Thread Zane van Iperen

On 29/11/20 5:46 am, Andriy Gelman wrote:


  void term_init(void)
  {
+#if defined __linux__
+struct sigaction action;


Nit: Should this have a "= {0}"?

My sigaction(2) says:
  On some architectures a union is involved: do not assign to both sa_handler 
and sa_sigaction.
so it's possible that sa_sigaction is left uninitialised.

If I'm wrong (quite possible, it's 2am), then part 1 lgtm.


+action.sa_handler = sigterm_handler;
+
+/* block other interrupts while processing this one */
+sigfillset(&action.sa_mask);
+
+/* restart interruptible functions (i.e. don't fail with EINTR)  */
+action.sa_flags = SA_RESTART;
+#endif

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger 

This requests loops to be vectorized using SIMD
instructions.
The performance increase is far from hand-optimized
assembly but still significant over the plain C version.
Typical values are a 2-4x speedup where a hand-written
version would achieve 4x-10x.
So it is far from a replacement, however some architures
will get hand-written assembler quite late or not at all,
and this is a good improvement for a trivial amount of work.
The cause, besides the compiler being a compiler, is
usually that it does not manage to use saturating instructions
and thus has to use 32-bit operations where actually
saturating 16-bit operations would be sufficient.
Other causes are for example the av_clip functions that
are not ideal for vectorization (and even as scalar code
not optimal for any modern CPU that has either CSEL or
MAX/MIN instructions).
And of course this only works for relatively simple
loops, the IDCT functions for example seemed not possible
to optimize that way.
Also note that while clang may accept the code and sometimes
produces warnings, it does not seem to do anything actually
useful at all.
Here are example measurements using gcc 10 under Linux (in a VM unfortunately)
on AArch64 on Apple M1:
Commad:
time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads 1 
-noframedrop

Original code:
real0m19.572s
user0m23.386s
sys 0m0.213s

Changing all put_hevc:
real0m15.648s
user0m19.503s (83.4% of original)
sys 0m0.186s

In addition changing add_residual:
real0m15.424s
user0m19.278s (82.4% of original)
sys 0m0.133s

In addition changing planar copy dither:
real0m15.040s
user0m18.874s (80.7% of original)
sys 0m0.168s

Signed-off-by: Reimar Döffinger 
---
 configure | 23 +
 libavcodec/hevcdsp_template.c | 47 +++
 libavutil/internal.h  |  6 +
 libswscale/swscale_unscaled.c |  3 +++
 4 files changed, 79 insertions(+)

diff --git a/configure b/configure
index 900505756b..73b7c3daeb 100755
--- a/configure
+++ b/configure
@@ -406,6 +406,7 @@ Toolchain options:
   --enable-pic build position-independent code
   --enable-thumb   compile for Thumb instruction set
   --enable-lto use link-time optimization
+  --enable-openmp-simd use the "omp simd" pragma to optimize code
   --env="ENV=override" override the environment variables
 
 Advanced options (experts only):
@@ -2335,6 +2336,7 @@ HAVE_LIST="
 opencl_dxva2
 opencl_vaapi_beignet
 opencl_vaapi_intel_media
+openmp_simd
 perl
 pod2man
 texi2html
@@ -2446,6 +2448,7 @@ CMDLINE_SELECT="
 extra_warnings
 logging
 lto
+openmp_simd
 optimizations
 rpath
 stripping
@@ -6926,6 +6929,26 @@ if enabled lto; then
 disable inline_asm_direct_symbol_refs
 fi
 
+if enabled openmp_simd; then
+ompopt="-fopenmp"
+if ! test_cflags $ompopt ; then
+test_cflags -Xpreprocessor -fopenmp && ompopt="-Xpreprocessor -fopenmp"
+fi
+test_cc $ompopt <> shift);
 src  += srcstride;
@@ -568,6 +573,7 @@ static void FUNC(put_hevc_pel_uni_w_pixels)(uint8_t *_dst, 
ptrdiff_t _dststride,
 
 ox = ox * (1 << (BIT_DEPTH - 8));
 for (y = 0; y < height; y++) {
+FF_OMP_SIMD
 for (x = 0; x < width; x++)
 dst[x] = av_clip_pixelsrc[x] << (14 - BIT_DEPTH)) * wx + 
offset) >> shift) + ox);
 src += srcstride;
@@ -592,6 +598,7 @@ static void FUNC(put_hevc_pel_bi_w_pixels)(uint8_t *_dst, 
ptrdiff_t _dststride,
 ox0 = ox0 * (1 << (BIT_DEPTH - 8));
 ox1 = ox1 * (1 << (BIT_DEPTH - 8));
 for (y = 0; y < height; y++) {
+FF_OMP_SIMD
 for (x = 0; x < width; x++) {
 dst[x] = av_clip_pixel(( (src[x] << (14 - BIT_DEPTH)) * wx1 + 
src2[x] * wx0 + (ox0 + ox1 + 1) * (1 << log2Wd)) >> (log2Wd + 1));
 }
@@ -623,6 +630,7 @@ static void FUNC(put_hevc_qpel_h)(int16_t *dst,
 ptrdiff_t srcstride = _srcstride / sizeof(pixel);
 const int8_t *filter= ff_hevc_qpel_filters[mx - 1];
 for (y = 0; y < height; y++) {
+FF_OMP_SIMD
 for (x = 0; x < width; x++)
 dst[x] = QPEL_FILTER(src, 1) >> (BIT_DEPTH - 8);
 src += srcstride;
@@ -639,6 +647,7 @@ static void FUNC(put_hevc_qpel_v)(int16_t *dst,
 ptrdiff_t srcstride = _srcstride / sizeof(pixel);
 const int8_t *filter= ff_hevc_qpel_filters[my - 1];
 for (y = 0; y < height; y++)  {
+FF_OMP_SIMD
 for (x = 0; x < width; x++)
 dst[x] = QPEL_FILTER(src, srcstride) >> (BIT_DEPTH - 8);
 src += srcstride;
@@ -662,6 +671,7 @@ static void FUNC(put_hevc_qpel_hv)(int16_t *dst,
 src   -= QPEL_EXTRA_BEFORE * srcstride;
 filter = ff_hevc_qpel_filters[mx - 1];
 for (y = 0; y < height + QPEL_EXTRA; y++) {
+FF_OMP_SIMD
 for (x = 0; x < width; x++)
 tmp[x] = QPEL_FILTER(src

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-10 Thread Lynne
Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de:

> From: Reimar Döffinger 
>
> This requests loops to be vectorized using SIMD
> instructions.
> The performance increase is far from hand-optimized
> assembly but still significant over the plain C version.
> Typical values are a 2-4x speedup where a hand-written
> version would achieve 4x-10x.
> So it is far from a replacement, however some architures
> will get hand-written assembler quite late or not at all,
> and this is a good improvement for a trivial amount of work.
> The cause, besides the compiler being a compiler, is
> usually that it does not manage to use saturating instructions
> and thus has to use 32-bit operations where actually
> saturating 16-bit operations would be sufficient.
> Other causes are for example the av_clip functions that
> are not ideal for vectorization (and even as scalar code
> not optimal for any modern CPU that has either CSEL or
> MAX/MIN instructions).
> And of course this only works for relatively simple
> loops, the IDCT functions for example seemed not possible
> to optimize that way.
> Also note that while clang may accept the code and sometimes
> produces warnings, it does not seem to do anything actually
> useful at all.
> Here are example measurements using gcc 10 under Linux (in a VM unfortunately)
> on AArch64 on Apple M1:
> Commad:
> time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads 1 
> -noframedrop
>
> Original code:
> real0m19.572s
> user0m23.386s
> sys 0m0.213s
>
> Changing all put_hevc:
> real0m15.648s
> user0m19.503s (83.4% of original)
> sys 0m0.186s
>
> In addition changing add_residual:
> real0m15.424s
> user0m19.278s (82.4% of original)
> sys 0m0.133s
>
> In addition changing planar copy dither:
> real0m15.040s
> user0m18.874s (80.7% of original)
> sys 0m0.168s
>

I think I have to disagree.
The performance gains are marginal, its definitely something the compiler should
be able to decide on its own, and it makes performance highly compiler 
dependent.
And I'm not even resorting to the painfully obvious FUD arguments that could be 
made.

Most of the loops this is added to are trivially SIMDable. Just because no one 
has
had the motivation to do SIMD for a pretty unpopular codec doesn't mean we 
should
compromise.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux

2021-01-10 Thread Andriy Gelman
On Sun, 10. Jan 16:32, Zane van Iperen wrote:
> On 29/11/20 5:46 am, Andriy Gelman wrote:
> 
> >   void term_init(void)
> >   {
> > +#if defined __linux__
> > +struct sigaction action;

Hi Zane,

Thanks for reviewing the patch.

> 
> Nit: Should this have a "= {0}"?
> 
> My sigaction(2) says:
>   On some architectures a union is involved: do not assign to both sa_handler 
> and sa_sigaction.
> so it's possible that sa_sigaction is left uninitialised.
> 
> If I'm wrong (quite possible, it's 2am), then part 1 lgtm.
> 

SA_SIGINFO in sa_flags used to decide whether sa_handler or sa_sigaction is
chosen from the union.

But, there is one function pointer sa_restorer in struct sigaction that's
currently not initialized. The docs say this pointer is used internally by
glibc/kernel, and should not be used by applications.  It doesn't say that it
needs to be set to NULL, but I suppose it's a good practise.  I'll add your
suggestion to the patch.

Will apply the patch in a few days unless there are other comments.

-- 
Andriy
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v5] avformat/udp: return the error code instead of generic EIO

2021-01-10 Thread Marton Balint



On Sun, 10 Jan 2021, lance.lmw...@gmail.com wrote:


From: Limin Wang 

Signed-off-by: Limin Wang 
---
libavformat/udp.c | 55 +--
1 file changed, 33 insertions(+), 22 deletions(-)


[...]



@@ -888,8 +901,6 @@ static int udp_open(URLContext *h, const char *uri, int 
flags)
}

if ((!is_output && s->circular_buffer_size) || (is_output && s->bitrate && 
s->circular_buffer_size)) {
-int ret;
-
/* start the task going */
s->fifo = av_fifo_alloc(s->circular_buffer_size);
ret = pthread_mutex_init(&s->mutex, NULL);


This ret (and some others later in the code) are not an AVERROR(), you 
should convert it to AVERROR() before returning.


Regards,
Marton
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] FFmpeg buying an Apple M1 Mac Mini

2021-01-10 Thread Kieran Kunhya
>
> > I will buy these if nobody objects by the end of the week.
>
> Totally in favor, also I'd like us to have two machines running.
>
> Thanks for hosting!
>
> -Thilo
>

Hi,

Just to confirm, I have bought the two Mac Minis.
They should be delivered and racked up in early February (lockdown may
delay this).

Regards,
Kieran
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] Buying and hosting a HiFive RISC-V system

2021-01-10 Thread Kieran Kunhya
Hello,

Lynne has suggested on IRC that we purchase one or more of these:
https://www.sifive.com/boards/hifive-unmatched

I think this is an interesting idea as RISC-V is an important platform for
the future (like M1).
I'll likely have to buy from Mouser (as I'm not sure SPI will accept
CrowdSupply) and there is a long lead-time for it:
https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA==

Also, I'll have to claim for a case and M.2 SSD.

I am happy to host this like with the Apple M1.

Regards,
Kieran Kunhya
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Buying and hosting a HiFive RISC-V system

2021-01-10 Thread chen
In my evaluate, the RISC-V code density is 60% compare to ARM, with 
C-Extension, it raise to 80%
it may be a big problem play large ffmpeg on really products, but we have more 
space to improve ffmpeg on it.


At 2021-01-11 04:21:07, "Kieran Kunhya"  wrote:
>Hello,
>
>Lynne has suggested on IRC that we purchase one or more of these:
>https://www.sifive.com/boards/hifive-unmatched
>
>I think this is an interesting idea as RISC-V is an important platform for
>the future (like M1).
>I'll likely have to buy from Mouser (as I'm not sure SPI will accept
>CrowdSupply) and there is a long lead-time for it:
>https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA==
>
>Also, I'll have to claim for a case and M.2 SSD.
>
>I am happy to host this like with the Apple M1.
>
>Regards,
>Kieran Kunhya
>___
>ffmpeg-devel mailing list
>ffmpeg-devel@ffmpeg.org
>https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
>To unsubscribe, visit link above, or email
>ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] Buying and hosting a HiFive RISC-V system

2021-01-10 Thread Joel Linn
Keep in mind though that the RISC-V Vector Extensions (which btw look 
really smart and promising) are not implemented in the SiFive Unmatched 
Chip yet (iIrc). But one has to start somewhere and some future embedded 
devices like will also lack those.


On 2021-01-10 21:39, chen wrote:

In my evaluate, the RISC-V code density is 60% compare to ARM, with
C-Extension, it raise to 80%
it may be a big problem play large ffmpeg on really products, but we
have more space to improve ffmpeg on it.


At 2021-01-11 04:21:07, "Kieran Kunhya"  wrote:

Hello,

Lynne has suggested on IRC that we purchase one or more of these:
https://www.sifive.com/boards/hifive-unmatched

I think this is an interesting idea as RISC-V is an important platform 
for

the future (like M1).
I'll likely have to buy from Mouser (as I'm not sure SPI will accept
CrowdSupply) and there is a long lead-time for it:
https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA==

Also, I'll have to claim for a case and M.2 SSD.

I am happy to host this like with the Apple M1.

Regards,
Kieran Kunhya
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 2/5] avcodec/fft_template: Remove unused fixed-point cosine tables

2021-01-10 Thread Andreas Rheinhardt
Michael Niedermayer:
> On Sun, Jan 10, 2021 at 01:56:21AM +0100, Andreas Rheinhardt wrote:
>> Michael Niedermayer:
>>> On Thu, Jan 07, 2021 at 12:13:05AM +0100, Andreas Rheinhardt wrote:
 There are three types of FFTs: floating-point, 32-bit fixed-point and
 16-bit fixed-point. The latter has exactly one user: The fixed-point
 AC-3-encoder; the cosine tables used by it use up to seven bits. The
 tables corresponding to eight to seventeen bits are unused, as are the
 FFT functions for these bits.

 Therefore this commit removes these tables and functions. This is
 especially beneficial when using hardcoded tables as they take up 
 moreFirst,
 than 255 KiB. But even without it one saves said unused functions as
 well as entries in corresponding tables (this also saves relocations).

 Signed-off-by: Andreas Rheinhardt 
 ---
 Thee changes to ARM assembly are honstely untested. I hope someone can
 test them. Btw: It seems that the ARM assembly code wouldn't be able to
 deal with an FFT with more than 16 bits (no function for this has been
 defined), which only worked because no one ever used that many bits with
 the fixed-point FFT.

  libavcodec/arm/fft_fixed_neon.S | 18 --
  libavcodec/cos_tablegen.c   |  4 ++--
  libavcodec/fft.h|  4 +++-
  libavcodec/fft_fixed.c  |  1 +
  libavcodec/fft_template.c   | 31 +++
  tests/fate/fft.mak  |  8 ++--
  6 files changed, 35 insertions(+), 31 deletions(-)
>>>
>>> make -j32 libavcodec/tests/fft-fixed && libavcodec/tests/fft-fixed
>>> Segmentation fault (core dumped)
>>>
>>> (if you cant repro say so and ill rebuild with debug symbols ...)
>>>
>>> thx
>>> [...]
>>>
>> 1. Lynne has an alternative patchset that makes the only user of
>> fft_fixed use fft_fixed_32 instead, so this is not important any more.
> 
>> 2. Are you testing the ARM assembly code (for which I ask for a test) or
> 
> x86-64
> 
>> not? If not, then this surprises me. Did you apply the changes to
>> fft.mak (some of the tests have been removed as they tested
>> functionality that was unused (apart from the tests) and has therefore
>> been removed).
> 
> i applied the changes from this patchset up to and including the patch
> and also did a make distclean
> FFT 512 test

Now I see what's the problem. You are not running the fft-tests from the
FATE-suite (where I disabled unused and newly unsupported tests);
instead you are directly using the underlying tests in libavcodec/tests
and these tests default to nbits = 9, which is unsupported for fft-fixed
with this patch applied. ff_fft_init_fixed is correctly erroring out,
yet said error code is ignored in the test's fft_init (therefore testing
an unsupported nbits already leads to segfaults on master (try
libavcodec/tests/fft -n 20)). So the error checking in the tests needs
to be improved (there are other unchecked allocations, too).
Furthermore, if this patch were to be applied, one should set the
default number of bits for fft-fixed to something supported; but that
point is moot given that this patch has been superseded by Lynne's (who
wants to nuke fft-fixed).

- Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/xbmenc: Better xbm memory use

2021-01-10 Thread Jose Da Silva
Small memory reduction which uses approx 6/7th total memory.
Assuming \n is 2bytes, we first need {32+33+40+5}=110
but we also need to include the terminating zero => 110+1 = 111 (bug-fix).

Then assuming \n is 2bytes, data requires => height * (linesize * 6 + 2)
For example, " 0x00, 0x11, 0x22,\n"
From 81436261e6de8ddaf1ce3c6f010ab2c018f92eb8 Mon Sep 17 00:00:00 2001
From: Joe Da Silva 
Date: Sun, 10 Jan 2021 01:35:05 -0800
Subject: [PATCH] avcodec/xbmenc: Better xbm memory use

Small memory reduction which uses approx 6/7th total memory.
Assuming \n is 2bytes, we first need {32+33+40+5}=110
but we also need to include the terminating zero => 110+1 = 111

Assuming \n is 2bytes, data requires => height * (linesize * 6 + 2)
For example, " 0x00, 0x11, 0x22,\n"

Signed-off-by: Joe Da Silva 
---
 libavcodec/xbmenc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libavcodec/xbmenc.c b/libavcodec/xbmenc.c
index b25615f2a4..9222947893 100644
--- a/libavcodec/xbmenc.c
+++ b/libavcodec/xbmenc.c
@@ -31,7 +31,7 @@ static int xbm_encode_frame(AVCodecContext *avctx, AVPacket *pkt,
 uint8_t *ptr, *buf;
 
 linesize = (avctx->width + 7) / 8;
-size = avctx->height * (linesize * 7 + 2) + 110;
+size = avctx->height * (linesize * 6 + 2) + 111;
 if ((ret = ff_alloc_packet2(avctx, pkt, size, 0)) < 0)
 return ret;
 
-- 
2.30.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] avcodec/cbs: constify decompose_unit_types

2021-01-10 Thread James Almer
CBS doesn't change its contents in any way whatsoever internally, and most
users already set it to a const array.

Signed-off-by: James Almer 
---
 libavcodec/av1_frame_split_bsf.c | 2 +-
 libavcodec/av1_parser.c  | 2 +-
 libavcodec/cbs.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/libavcodec/av1_frame_split_bsf.c b/libavcodec/av1_frame_split_bsf.c
index 13bebe19f5..fa8b887b6c 100644
--- a/libavcodec/av1_frame_split_bsf.c
+++ b/libavcodec/av1_frame_split_bsf.c
@@ -214,7 +214,7 @@ static int av1_frame_split_init(AVBSFContext *ctx)
 if (ret < 0)
 return ret;
 
-s->cbc->decompose_unit_types= 
(CodedBitstreamUnitType*)decompose_unit_types;
+s->cbc->decompose_unit_types= decompose_unit_types;
 s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types);
 
 if (!ctx->par_in->extradata_size)
diff --git a/libavcodec/av1_parser.c b/libavcodec/av1_parser.c
index 181ff3a1be..6a76ffb7bc 100644
--- a/libavcodec/av1_parser.c
+++ b/libavcodec/av1_parser.c
@@ -191,7 +191,7 @@ static av_cold int av1_parser_init(AVCodecParserContext 
*ctx)
 if (ret < 0)
 return ret;
 
-s->cbc->decompose_unit_types= (CodedBitstreamUnitType 
*)decompose_unit_types;
+s->cbc->decompose_unit_types= decompose_unit_types;
 s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types);
 
 return 0;
diff --git a/libavcodec/cbs.h b/libavcodec/cbs.h
index 3fd0a0ef33..f022282b75 100644
--- a/libavcodec/cbs.h
+++ b/libavcodec/cbs.h
@@ -196,7 +196,7 @@ typedef struct CodedBitstreamContext {
  * Types not in this list will be available in bitstream form only.
  * If NULL, all supported types will be decomposed.
  */
-CodedBitstreamUnitType *decompose_unit_types;
+const CodedBitstreamUnitType *decompose_unit_types;
 /**
  * Length of the decompose_unit_types array.
  */
-- 
2.30.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] avcodec/cbs: constify decompose_unit_types

2021-01-10 Thread Andreas Rheinhardt
James Almer:
> CBS doesn't change its contents in any way whatsoever internally, and most
> users already set it to a const array.
> 
> Signed-off-by: James Almer 
> ---
>  libavcodec/av1_frame_split_bsf.c | 2 +-
>  libavcodec/av1_parser.c  | 2 +-
>  libavcodec/cbs.h | 2 +-
>  3 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/libavcodec/av1_frame_split_bsf.c 
> b/libavcodec/av1_frame_split_bsf.c
> index 13bebe19f5..fa8b887b6c 100644
> --- a/libavcodec/av1_frame_split_bsf.c
> +++ b/libavcodec/av1_frame_split_bsf.c
> @@ -214,7 +214,7 @@ static int av1_frame_split_init(AVBSFContext *ctx)
>  if (ret < 0)
>  return ret;
>  
> -s->cbc->decompose_unit_types= 
> (CodedBitstreamUnitType*)decompose_unit_types;
> +s->cbc->decompose_unit_types= decompose_unit_types;
>  s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types);
>  
>  if (!ctx->par_in->extradata_size)
> diff --git a/libavcodec/av1_parser.c b/libavcodec/av1_parser.c
> index 181ff3a1be..6a76ffb7bc 100644
> --- a/libavcodec/av1_parser.c
> +++ b/libavcodec/av1_parser.c
> @@ -191,7 +191,7 @@ static av_cold int av1_parser_init(AVCodecParserContext 
> *ctx)
>  if (ret < 0)
>  return ret;
>  
> -s->cbc->decompose_unit_types= (CodedBitstreamUnitType 
> *)decompose_unit_types;
> +s->cbc->decompose_unit_types= decompose_unit_types;
>  s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types);
>  
>  return 0;
> diff --git a/libavcodec/cbs.h b/libavcodec/cbs.h
> index 3fd0a0ef33..f022282b75 100644
> --- a/libavcodec/cbs.h
> +++ b/libavcodec/cbs.h
> @@ -196,7 +196,7 @@ typedef struct CodedBitstreamContext {
>   * Types not in this list will be available in bitstream form only.
>   * If NULL, all supported types will be decomposed.
>   */
> -CodedBitstreamUnitType *decompose_unit_types;
> +const CodedBitstreamUnitType *decompose_unit_types;
>  /**
>   * Length of the decompose_unit_types array.
>   */
> 
LGTM.

- Andreas
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH] libswscale/aarch64/hscale.S: Support more bit-depth variants.

2021-01-10 Thread Reimar . Doeffinger
From: Reimar Döffinger 

Trivially expand hscale assembler to support > 8 bit formats
both for input and output.
16-bit input is not supported as I am not certain how to
get sufficient test coverage.
---
 libswscale/aarch64/hscale.S  | 53 ++--
 libswscale/aarch64/swscale.c | 49 +++--
 2 files changed, 85 insertions(+), 17 deletions(-)

diff --git a/libswscale/aarch64/hscale.S b/libswscale/aarch64/hscale.S
index af55ffe2b7..3b42d39dac 100644
--- a/libswscale/aarch64/hscale.S
+++ b/libswscale/aarch64/hscale.S
@@ -20,7 +20,11 @@
 
 #include "libavutil/aarch64/asm.S"
 
-function ff_hscale_8_to_15_neon, export=1
+.macro hscale srcbits, dstbits, ldt, lds, c
+function ff_hscale_\srcbits\()_to_\dstbits\()_neon, export=1
+.if \dstbits >= 16
+moviv20.4S, #(0x1 << (\dstbits - 16)), msl #16
+.endif
 sbfiz   x7, x6, #1, #32 // filterSize*2 (*2 
because int16)
 1:  ldr w8, [x5], #4// filterPos[idx]
 ldr w0, [x5], #4// filterPos[idx + 1]
@@ -34,30 +38,30 @@ function ff_hscale_8_to_15_neon, export=1
 moviv1.2D, #0   // val sum part 2 (for 
dst[1])
 moviv2.2D, #0   // val sum part 3 (for 
dst[2])
 moviv3.2D, #0   // val sum part 4 (for 
dst[3])
-add x17, x3, w8, UXTW   // srcp + filterPos[0]
-add x8,  x3, w0, UXTW   // srcp + filterPos[1]
-add x0, x3, w11, UXTW   // srcp + filterPos[2]
-add x11, x3, w9, UXTW   // srcp + filterPos[3]
+add x17, x3, w8, UXTW #!!(\srcbits > 8) // srcp + 
filterPos[0]
+add x8,  x3, w0, UXTW #!!(\srcbits > 8) // srcp + 
filterPos[1]
+add x0, x3, w11, UXTW #!!(\srcbits > 8) // srcp + 
filterPos[2]
+add x11, x3, w9, UXTW #!!(\srcbits > 8) // srcp + 
filterPos[3]
 mov w15, w6 // filterSize counter
-2:  ld1 {v4.8B}, [x17], #8  // srcp[filterPos[0] + 
{0..7}]
+2:  ld1 {v4.\ldt}, [x17], \lds  // srcp[filterPos[0] + 
{0..7}]
 ld1 {v5.8H}, [x16], #16 // load 8x16-bit 
filter values, part 1
-ld1 {v6.8B}, [x8], #8   // srcp[filterPos[1] + 
{0..7}]
+ld1 {v6.\ldt}, [x8], \lds   // srcp[filterPos[1] + 
{0..7}]
 ld1 {v7.8H}, [x12], #16 // load 8x16-bit at 
filter+filterSize
-uxtlv4.8H, v4.8B// unpack part 1 to 
16-bit
+\c\cuxtlv4.8H, v4.8B// unpack part 1 to 
16-bit
 smlal   v0.4S, v4.4H, v5.4H // v0 accumulates 
srcp[filterPos[0] + {0..3}] * filter[{0..3}]
 smlal2  v0.4S, v4.8H, v5.8H // v0 accumulates 
srcp[filterPos[0] + {4..7}] * filter[{4..7}]
-ld1 {v16.8B}, [x0], #8  // srcp[filterPos[2] + 
{0..7}]
+ld1 {v16.\ldt}, [x0], \lds  // srcp[filterPos[2] + 
{0..7}]
 ld1 {v17.8H}, [x13], #16// load 8x16-bit at 
filter+2*filterSize
-uxtlv6.8H, v6.8B// unpack part 2 to 
16-bit
+\c\cuxtlv6.8H, v6.8B// unpack part 2 to 
16-bit
 smlal   v1.4S, v6.4H, v7.4H // v1 accumulates 
srcp[filterPos[1] + {0..3}] * filter[{0..3}]
-uxtlv16.8H, v16.8B  // unpack part 3 to 
16-bit
+\c\cuxtlv16.8H, v16.8B  // unpack part 3 to 
16-bit
 smlal   v2.4S, v16.4H, v17.4H   // v2 accumulates 
srcp[filterPos[2] + {0..3}] * filter[{0..3}]
 smlal2  v2.4S, v16.8H, v17.8H   // v2 accumulates 
srcp[filterPos[2] + {4..7}] * filter[{4..7}]
-ld1 {v18.8B}, [x11], #8 // srcp[filterPos[3] + 
{0..7}]
+ld1 {v18.\ldt}, [x11], \lds // srcp[filterPos[3] + 
{0..7}]
 smlal2  v1.4S, v6.8H, v7.8H // v1 accumulates 
srcp[filterPos[1] + {4..7}] * filter[{4..7}]
 ld1 {v19.8H}, [x4], #16 // load 8x16-bit at 
filter+3*filterSize
 subsw15, w15, #8// j -= 8: processed 
8/filterSize
-uxtlv18.8H, v18.8B  // unpack part 4 to 
16-bit
+\c\cuxtlv18.8H, v18.8B  // unpack part 4 to 
16-bit
 smlal   v3.4S, v18.4H, v19.4H   // v3 accumulates 
srcp[filterPos[3] + {0..3}] * filter[{0..3}]
 smlal2  v3.4S, v18.8H, v19.8H   // v3 acc

Re: [FFmpeg-devel] [PATCH v5] avformat/udp: return the error code instead of generic EIO

2021-01-10 Thread lance . lmwang
On Sun, Jan 10, 2021 at 08:29:30PM +0100, Marton Balint wrote:
> 
> 
> On Sun, 10 Jan 2021, lance.lmw...@gmail.com wrote:
> 
> > From: Limin Wang 
> > 
> > Signed-off-by: Limin Wang 
> > ---
> > libavformat/udp.c | 55 
> > +--
> > 1 file changed, 33 insertions(+), 22 deletions(-)
> 
> [...]
> 
> 
> > @@ -888,8 +901,6 @@ static int udp_open(URLContext *h, const char *uri, int 
> > flags)
> > }
> > 
> > if ((!is_output && s->circular_buffer_size) || (is_output && s->bitrate 
> > && s->circular_buffer_size)) {
> > -int ret;
> > -
> > /* start the task going */
> > s->fifo = av_fifo_alloc(s->circular_buffer_size);
> > ret = pthread_mutex_init(&s->mutex, NULL);
> 
> This ret (and some others later in the code) are not an AVERROR(), you
> should convert it to AVERROR() before returning.

OK, will add below code before goto error for these condition, correct?
ret = AVERROR(ret);

> 
> Regards,
> Marton
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

-- 
Thanks,
Limin Wang
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.

2021-01-10 Thread Carl Eugen Hoyos
Am So., 10. Jan. 2021 um 19:55 Uhr schrieb Lynne :
>
> Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de:
>
> > From: Reimar Döffinger 
> >
> > This requests loops to be vectorized using SIMD
> > instructions.
> > The performance increase is far from hand-optimized
> > assembly but still significant over the plain C version.
> > Typical values are a 2-4x speedup where a hand-written
> > version would achieve 4x-10x.
> > So it is far from a replacement, however some architures
> > will get hand-written assembler quite late or not at all,
> > and this is a good improvement for a trivial amount of work.
> > The cause, besides the compiler being a compiler, is
> > usually that it does not manage to use saturating instructions
> > and thus has to use 32-bit operations where actually
> > saturating 16-bit operations would be sufficient.
> > Other causes are for example the av_clip functions that
> > are not ideal for vectorization (and even as scalar code
> > not optimal for any modern CPU that has either CSEL or
> > MAX/MIN instructions).
> > And of course this only works for relatively simple
> > loops, the IDCT functions for example seemed not possible
> > to optimize that way.
> > Also note that while clang may accept the code and sometimes
> > produces warnings, it does not seem to do anything actually
> > useful at all.
> > Here are example measurements using gcc 10 under Linux (in a VM 
> > unfortunately)
> > on AArch64 on Apple M1:
> > Commad:
> > time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads 
> > 1 -noframedrop
> >
> > Original code:
> > real0m19.572s
> > user0m23.386s
> > sys 0m0.213s
> >
> > Changing all put_hevc:
> > real0m15.648s
> > user0m19.503s (83.4% of original)
> > sys 0m0.186s
> >
> > In addition changing add_residual:
> > real0m15.424s
> > user0m19.278s (82.4% of original)
> > sys 0m0.133s
> >
> > In addition changing planar copy dither:
> > real0m15.040s
> > user0m18.874s (80.7% of original)
> > sys 0m0.168s
> >
>
> I think I have to disagree.

> The performance gains are marginal

This sounds wrong.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 1/3] dnn/openvino: remove unnecessary code

2021-01-10 Thread Ting Fu
Signed-off-by: Ting Fu 
---
 libavfilter/dnn/dnn_backend_openvino.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c 
b/libavfilter/dnn/dnn_backend_openvino.c
index d27e451eea..050be97209 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -284,14 +284,6 @@ static DNNReturnType get_input_ov(void *model, DNNData 
*input, const char *input
 return DNN_ERROR;
 }
 
-// The order of dims in the openvino is fixed and it is always 
NCHW for 4-D data.
-// while we pass NHWC data from FFmpeg to openvino
-status = ie_network_set_input_layout(ov_model->network, 
input_name, NHWC);
-if (status != OK) {
-av_log(ctx, AV_LOG_ERROR, "Input \"%s\" does not match layout 
NHWC\n", input_name);
-return DNN_ERROR;
-}
-
 input->channels = dims.dims[1];
 input->height   = dims.dims[2];
 input->width= dims.dims[3];
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH 2/3] dnn/openvino: refine code for better model initialization

2021-01-10 Thread Ting Fu
Move openvino model/inference request creation and initialization steps
from ff_dnn_load_model_ov to new function init_model_ov, for later input
resize support.

Signed-off-by: Ting Fu 
---
 libavfilter/dnn/dnn_backend_openvino.c | 153 +++--
 1 file changed, 93 insertions(+), 60 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c 
b/libavfilter/dnn/dnn_backend_openvino.c
index 050be97209..d6e0593a0b 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -217,6 +217,78 @@ static void infer_completion_callback(void *args)
 task->done = 1;
 }
 
+static DNNReturnType init_model_ov(OVModel *ov_model)
+{
+OVContext *ctx = &ov_model->ctx;
+IEStatusCode status;
+ie_available_devices_t a_dev;
+ie_config_t config = {NULL, NULL, NULL};
+char *all_dev_names = NULL;
+
+status = ie_core_load_network(ov_model->core, ov_model->network, 
ctx->options.device_type, &config, &ov_model->exe_network);
+if (status != OK) {
+av_log(ctx, AV_LOG_ERROR, "Failed to load OpenVINO model network\n");
+status = ie_core_get_available_devices(ov_model->core, &a_dev);
+if (status != OK) {
+av_log(ctx, AV_LOG_ERROR, "Failed to get available devices\n");
+goto err;
+}
+for (int i = 0; i < a_dev.num_devices; i++) {
+APPEND_STRING(all_dev_names, a_dev.devices[i])
+}
+av_log(ctx, AV_LOG_ERROR,"device %s may not be supported, all 
available devices are: \"%s\"\n",
+   ctx->options.device_type, all_dev_names);
+goto err;
+}
+
+// create infer_request for sync execution
+status = ie_exec_network_create_infer_request(ov_model->exe_network, 
&ov_model->infer_request);
+if (status != OK)
+goto err;
+
+// create infer_requests for async execution
+if (ctx->options.nireq <= 0) {
+// the default value is a rough estimation
+ctx->options.nireq = av_cpu_count() / 2 + 1;
+}
+
+ov_model->request_queue = ff_safe_queue_create();
+if (!ov_model->request_queue) {
+goto err;
+}
+
+for (int i = 0; i < ctx->options.nireq; i++) {
+ie_infer_request_t *request;
+RequestItem *item = av_mallocz(sizeof(*item));
+if (!item) {
+goto err;
+}
+status = ie_exec_network_create_infer_request(ov_model->exe_network, 
&request);
+if (status != OK) {
+av_freep(&item);
+goto err;
+}
+item->infer_request = request;
+item->callback.completeCallBackFunc = infer_completion_callback;
+item->callback.args = item;
+if (ff_safe_queue_push_back(ov_model->request_queue, item) < 0) {
+av_freep(&item);
+goto err;
+}
+}
+
+ov_model->task_queue = ff_queue_create();
+if (!ov_model->task_queue) {
+goto err;
+}
+
+return DNN_SUCCESS;
+
+err:
+ff_dnn_free_model_ov(&ov_model->model);
+return DNN_ERROR;
+}
+
 static DNNReturnType execute_model_ov(TaskItem *task, RequestItem *request)
 {
 IEStatusCode status;
@@ -325,6 +397,13 @@ static DNNReturnType get_output_ov(void *model, const char 
*input_name, int inpu
 in_frame->width = input_width;
 in_frame->height = input_height;
 
+if (!ov_model->exe_network) {
+if (init_model_ov(ov_model) != DNN_SUCCESS) {
+av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network 
or inference request\n");
+return DNN_ERROR;
+};
+}
+
 task.done = 0;
 task.do_ioproc = 0;
 task.async = 0;
@@ -347,13 +426,10 @@ static DNNReturnType get_output_ov(void *model, const 
char *input_name, int inpu
 
 DNNModel *ff_dnn_load_model_ov(const char *model_filename, const char 
*options, AVFilterContext *filter_ctx)
 {
-char *all_dev_names = NULL;
 DNNModel *model = NULL;
 OVModel *ov_model = NULL;
 OVContext *ctx = NULL;
 IEStatusCode status;
-ie_config_t config = {NULL, NULL, NULL};
-ie_available_devices_t a_dev;
 
 model = av_mallocz(sizeof(DNNModel));
 if (!model){
@@ -385,63 +461,6 @@ DNNModel *ff_dnn_load_model_ov(const char *model_filename, 
const char *options,
 if (status != OK)
 goto err;
 
-status = ie_core_load_network(ov_model->core, ov_model->network, 
ctx->options.device_type, &config, &ov_model->exe_network);
-if (status != OK) {
-av_log(ctx, AV_LOG_ERROR, "Failed to init OpenVINO model\n");
-status = ie_core_get_available_devices(ov_model->core, &a_dev);
-if (status != OK) {
-av_log(ctx, AV_LOG_ERROR, "Failed to get available devices\n");
-goto err;
-}
-for (int i = 0; i < a_dev.num_devices; i++) {
-APPEND_STRING(all_dev_names, a_dev.devices[i])
-}
-av_log(ctx, AV_LOG_ERROR,"device %s may not be supported, all 
available devices are: \"%s\"\n",
-   ctx->

[FFmpeg-devel] [PATCH 3/3] dnn/openvino: support model input resize

2021-01-10 Thread Ting Fu
OpenVINO APIs require specify input size to run the model, while some
OpenVINO model does accept different input size. To enable this feature
adding input_resizable option here for easier use.
Setting bool variable input_resizable to specify if the input can be resizable 
or not.
input_resizable = 1 means support input resize, aka accept different input size.
input_resizable = 0 (default) means do not support input resize.
Please make sure the inference model does accept different input size
before use this option, otherwise the inference engine may report error(s).
eg: ./ffmpeg -i video_name.mp4 -vf dnn_processing=dnn_backend=openvino:\
  model=model_name.xml:input=input_name:output=output_name:\
  options=device=CPU\&input_resizable=1 -y output_video_name.mp4

Signed-off-by: Ting Fu 
---
 libavfilter/dnn/dnn_backend_openvino.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/libavfilter/dnn/dnn_backend_openvino.c 
b/libavfilter/dnn/dnn_backend_openvino.c
index d6e0593a0b..65d74702ff 100644
--- a/libavfilter/dnn/dnn_backend_openvino.c
+++ b/libavfilter/dnn/dnn_backend_openvino.c
@@ -37,6 +37,7 @@
 typedef struct OVOptions{
 char *device_type;
 int nireq;
+int input_resizable;
 } OVOptions;
 
 typedef struct OVContext {
@@ -83,6 +84,7 @@ typedef struct RequestItem {
 static const AVOption dnn_openvino_options[] = {
 { "device", "device to run model", OFFSET(options.device_type), 
AV_OPT_TYPE_STRING, { .str = "CPU" }, 0, 0, FLAGS },
 { "nireq",  "number of request",   OFFSET(options.nireq),   
AV_OPT_TYPE_INT,{ .i64 = 0 }, 0, INT_MAX, FLAGS },
+{ "input_resizable", "can input be resizable or not", 
OFFSET(options.input_resizable), AV_OPT_TYPE_BOOL,   { .i64 = 0 }, 0, 1, 
FLAGS },
 { NULL }
 };
 
@@ -334,6 +336,7 @@ static DNNReturnType get_input_ov(void *model, DNNData 
*input, const char *input
 size_t model_input_count = 0;
 dimensions_t dims;
 precision_e precision;
+int input_resizable = ctx->options.input_resizable;
 
 status = ie_network_get_inputs_number(ov_model->network, 
&model_input_count);
 if (status != OK) {
@@ -357,8 +360,8 @@ static DNNReturnType get_input_ov(void *model, DNNData 
*input, const char *input
 }
 
 input->channels = dims.dims[1];
-input->height   = dims.dims[2];
-input->width= dims.dims[3];
+input->height   = input_resizable ? -1 : dims.dims[2];
+input->width= input_resizable ? -1 : dims.dims[3];
 input->dt   = precision_to_datatype(precision);
 return DNN_SUCCESS;
 } else {
@@ -383,6 +386,8 @@ static DNNReturnType get_output_ov(void *model, const char 
*input_name, int inpu
 RequestItem request;
 AVFrame *in_frame = av_frame_alloc();
 AVFrame *out_frame = NULL;
+IEStatusCode status;
+input_shapes_t input_shapes;
 
 if (!in_frame) {
 av_log(ctx, AV_LOG_ERROR, "Failed to allocate memory for input 
frame\n");
@@ -397,6 +402,18 @@ static DNNReturnType get_output_ov(void *model, const char 
*input_name, int inpu
 in_frame->width = input_width;
 in_frame->height = input_height;
 
+if (ctx->options.input_resizable) {
+status = ie_network_get_input_shapes(ov_model->network, &input_shapes);
+input_shapes.shapes->shape.dims[2] = input_height;
+input_shapes.shapes->shape.dims[3] = input_width;
+status |= ie_network_reshape(ov_model->network, input_shapes);
+ie_network_input_shapes_free(&input_shapes);
+if (status != OK) {
+av_log(ctx, AV_LOG_ERROR, "Failed to reshape input size for %s\n", 
input_name);
+return DNN_ERROR;
+}
+}
+
 if (!ov_model->exe_network) {
 if (init_model_ov(ov_model) != DNN_SUCCESS) {
 av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network 
or inference request\n");
-- 
2.17.1

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".