Re: [FFmpeg-devel] [PATCH v2 01/11] avcodec/vvc: add shared header for vvc
How about we define it as 20, check the size and return error if > 20. 20 should enough for most of clips. hevc used 20. On Sun, Jan 10, 2021 at 9:39 AM Nuo Mi wrote: > > > On Sun, Jan 10, 2021 at 3:09 AM Mark Thompson wrote: > >> On 09/01/2021 07:34, Nuo Mi wrote: >> > --- >> > libavcodec/vvc.h | 124 +++ >> > 1 file changed, 124 insertions(+) >> > create mode 100644 libavcodec/vvc.h >> > >> > diff --git a/libavcodec/vvc.h b/libavcodec/vvc.h >> > new file mode 100644 >> > index 00..0bd2acac1d >> > --- /dev/null >> > +++ b/libavcodec/vvc.h >> > @@ -0,0 +1,124 @@ >> > ... >> > + >> > +enum { >> > +VVC_MAX_PLANES = 3, >> >> MAX_SAMPLE_ARRAYS, with reference to 6.2? The term "plane" is never used >> in the specification at all. >> >> > +//7.4.3.3 The value of vps_max_sublayers_minus1 shall be in the >> range of 0 to 6, inclusive >> > +VVC_MAX_SUBLAYERS = 7, >> > + >> > +// 7.3.2.3: vps_video_parameter_set_id is u(4). >> > +VVC_MAX_VPS_COUNT = 16, >> > +// 7.3.2.4: sps_seq_parameter_set_id is u(4) >> > +VVC_MAX_SPS_COUNT = 16, >> > +// 7.3.2.5: pps_pic_parameter_set_id is u(6) >> > +VVC_MAX_PPS_COUNT = 64, >> > + >> > +// 7.4.4.1: ptl_num_sub_profiles is u(8) >> > +VVC_MAX_SUB_PROFILES = 256, >> > + >> > +// A.4.2: according to (1577), MaxDpbSize is bounded above by 2 * >> maxDpbPicBuf(8) >> > +VVC_MAX_DPB_SIZE = 16, >> > + >> > +//7.4.3.4 sps_num_ref_pic_lists in range [0, 64] >> > +VVC_MAX_REF_PIC_LISTS = 64, >> > + >> > +//7.4.3.3 sps_num_points_in_qp_table_minus1[i] in range [0, 36 − >> sps_qp_table_start_minus26[i]], >> > +//sps_qp_table_start_minus26[i] in range >> [sps_qp_table_start_minus26[i] −26 − QpBdOffset, 36] >> > +//for 10 bitsQpBdOffset is 12, so >> sps_num_points_in_qp_table_minus1[i] in range [0, 74] >> > +VVC_MAX_POINTS_IN_QP_TABLE = 75, >> > + >> > +// 7.4.6.1: hrd_cpb_cnt_minus1 is in [0, 31]. >> > +VVC_MAX_CPB_CNT = 32, >> > + >> > +// A.4.1: the highest level allows a MaxLumaPs of 35 651 584. >> > +VVC_MAX_LUMA_PS = 35651584, >> > +// A.4.1: pic_width_in_luma_samples and pic_height_in_luma_samples >> are >> > +// constrained to be not greater than sqrt(MaxLumaPs * 8). Hence >> height/ >> > +// width are bounded above by sqrt(8 * 35651584) = 16888.2 samples. >> > +VVC_MAX_WIDTH = 16888, >> > +VVC_MAX_HEIGHT = 16888, >> > + >> > +// A.4.1: table A.1 allows at most 440 tiles for any au. >> > +VVC_MAX_TILE_ROWS= 440, >> >> Is this bound really the best we can do? >> >> That is, is it actually possible to construct a valid stream with 440 >> tile rows? It must have a single tile column and a height of at least >> 14080 (for 440 rows of 32x32 CTUs), which feels extreme enough that it >> might hit some of the other level constraints. >> > The VVC_MAX_HEIGHT is 16888, it's higher than 14080. > If we limit the VVC_MAX_HEIGHT to 4k, we can reduce it to 135. > >> >> > +// A.4.1: table A.1 allows at most 20 tile columns for any level. >> > +VVC_MAX_TILE_COLUMNS = 20, >> > + >> > +// A.4.1 table A.1 allows at most 600 slice for any level. >> > +VVC_MAX_SLICES = 600, >> > + >> > +// 7.4.8: in the worst case (tiles_enabled_flag and >> > +// entropy_coding_sync_enabled_flag are both set), entry points >> can be >> > +// placed at the beginning of every Ctb row in every tile, giving >> an >> > +// upper bound of (num_tile_columns_minus1 + 1) * PicHeightInCtbsY >> - 1. >> > +// Only a stream with very high resolution and perverse parameters >> could >> > +// get near that, though, so set a lower limit here with the >> maximum >> > +// possible value for 8K video (at most 135 32x32 Ctb rows). >> > +VVC_MAX_ENTRY_POINTS = VVC_MAX_TILE_COLUMNS * 135, >> > +}; >> > + >> > +#endif /* AVCODEC_VVC_H */ >> >> - Mark >> ___ >> ffmpeg-devel mailing list >> ffmpeg-devel@ffmpeg.org >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel >> >> To unsubscribe, visit link above, or email >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > > ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2] avformat/utils: prevent ts out of [min_ts, max_ts] interval due to rouding
Rounding min_ts towards +infinity and max_ts towards -infinity can make ts out of the [min_ts, max_ts] interval, and then leads to seek failure. Fix it by using the simple rounding as ts for both min_ts and max_ts. --- libavformat/utils.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/libavformat/utils.c b/libavformat/utils.c index 503e583ad0..88221c5ac4 100644 --- a/libavformat/utils.c +++ b/libavformat/utils.c @@ -2500,10 +2500,10 @@ int avformat_seek_file(AVFormatContext *s, int stream_index, int64_t min_ts, ts = av_rescale_q(ts, AV_TIME_BASE_Q, time_base); min_ts = av_rescale_rnd(min_ts, time_base.den, time_base.num * (int64_t)AV_TIME_BASE, -AV_ROUND_UP | AV_ROUND_PASS_MINMAX); +AV_ROUND_PASS_MINMAX); max_ts = av_rescale_rnd(max_ts, time_base.den, time_base.num * (int64_t)AV_TIME_BASE, -AV_ROUND_DOWN | AV_ROUND_PASS_MINMAX); +AV_ROUND_PASS_MINMAX); stream_index = 0; } -- 2.28.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 06/11] avcodec: add cbs for h266/vvc
On Sun, Jan 10, 2021 at 5:34 AM Mark Thompson wrote: > On 09/01/2021 07:34, Nuo Mi wrote: > > --- > > configure |2 + > > libavcodec/Makefile |1 + > > libavcodec/cbs.c |6 + > > libavcodec/cbs_h2645.c| 373 > > libavcodec/cbs_h266.h | 840 > > libavcodec/cbs_h266_syntax_template.c | 2761 + > > libavcodec/cbs_internal.h |3 +- > > 7 files changed, 3985 insertions(+), 1 deletion(-) > > create mode 100644 libavcodec/cbs_h266.h > > create mode 100644 libavcodec/cbs_h266_syntax_template.c > > > > ... > > @@ -920,6 +934,135 @@ static int > cbs_h265_read_nal_unit(CodedBitstreamContext *ctx, > > return 0; > > } > > > > +static int cbs_h266_replace_ph(CodedBitstreamContext *ctx, > > + CodedBitstreamUnit *unit) > > +{ > > +CodedBitstreamH266Context *priv = ctx->priv_data; > > +int err; > > +err = ff_cbs_make_unit_refcounted(ctx, unit); > > +if (err < 0) > > +return err; > > +av_buffer_unref(&priv->ph_ref); > > +av_assert0(unit->content_ref); > > +priv->ph_ref = av_buffer_ref(unit->content_ref); > > +if (!priv->ph_ref) > > +return AVERROR(ENOMEM); > > +priv->active_ph = priv->ph = (H266RawPH *)priv->ph_ref->data; > > Why are there too variables here? They seem to always be the same. > priv->active_ph is read-only, priv->ph is writeable pointer. I can change to priv->ph only if you prefer. > > > +return 0; > > +} > > + > > ... > > + > > static int cbs_h2645_assemble_fragment(CodedBitstreamContext *ctx, > > CodedBitstreamFragment *frag) > > { > > @@ -1248,6 +1494,11 @@ static int > cbs_h2645_assemble_fragment(CodedBitstreamContext *ctx, > >(unit->type == HEVC_NAL_VPS || > > unit->type == HEVC_NAL_SPS || > > unit->type == HEVC_NAL_PPS)) || > > +(ctx->codec->codec_id == AV_CODEC_ID_VVC && > > + (unit->type == VVC_VPS_NUT || > > + unit->type == VVC_SPS_NUT || > > + unit->type == VVC_PPS_NUT || > > + unit->type == VVC_PREFIX_APS_NUT)) || > > Also various other things, which might be here since passthrough does not > require decomposition to be implemented. > > This test is getting unwieldy - maybe it should be moved to a new function > cbs_h2645_unit_requires_zero_byte(). > done > > i == 0 /* (Assume this is the start of an access unit.) > */) { > > // zero_byte > > data[dp++] = 0; > > @@ -1362,6 +1613,41 @@ static void cbs_h265_close(CodedBitstreamContext > *ctx) > > av_buffer_unref(&h265->pps_ref[i]); > > } > > > > ... > > @@ -1506,6 +1792,77 @@ static const CodedBitstreamUnitTypeDescriptor > cbs_h265_unit_types[] = { > > CBS_UNIT_TYPE_END_OF_LIST > > }; > > > > +static void cbs_h266_free_sei(void *opaque, uint8_t *content) > > +{ > > +} > > So as implemented currently it is POD? > Yes, currently only md5 sei implemented. We can do more after your react patch merged. > > > + > > +static const CodedBitstreamUnitTypeDescriptor cbs_h266_unit_types[] = { > > ... > > + > > +typedef struct H266RawNALUnitHeader { > > +uint8_t nuh_layer_id; > > +uint8_t nal_unit_type; > > +uint8_t nuh_temporal_id_plus1; > > +} H266RawNALUnitHeader; > > + > > +typedef struct H266GeneralConstraintsInfo { > > +uint8_t gci_present_flag; > > + > > ... > > + > > +/* loop filter */ > > +uint8_t gci_no_sao_constraint_flag; > > +uint8_t gci_no_alf_constraint_flag; > > +uint8_t gci_no_ccalf_constraint_flag; > > +uint8_t gci_no_lmcs_constraint_flag; > > +uint8_t gci_no_ladf_constraint_flag; > > +uint8_t gci_no_virtual_boundaries_constraint_flag; > > +uint8_t gci_num_reserved_bits; > > Also needs gci_reserved_zero_bit[], so that we can handle streams with > future constraints rather than just rejecting them. > > "Although the value of gci_num_reserved_bits is required to be equal to 0 > in this version > of this Specification, decoders conforming to this version of this > Specification shall allow the value of > gci_num_reserved_bits greater than 0 to appear in the syntax and shall > ignore the values of all the gci_reserved_zero_bit[ i ] > syntax elements when gci_num_reserved_bits is greater than 0." > This just follows the same pattern as h265. How about we create a separate patch set for this, fix h265 as well. > > +} H266GeneralConstraintsInfo; > > + > > ... > > + > > +typedef struct H266RawVPS { > > +H266RawNALUnitHeader nal_unit_header; > > + > > +uint8_t vps_video_parameter_set_id; > > + > > +uint8_t vps_max_layers_minus1; > > +uint8_t vps_max_sublayers_minus1; > > +/*TODO add more*/ > > +H266RawExtensionData extension_data; > > +} H266RawVPS; > > You don't actually use the VPS struc
[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.
From: Reimar Döffinger Speedup is fairly small, around 1.5%, but these are fairly simple. --- libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 9f67e45..edd03a0 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -36,6 +36,196 @@ const trans, align=4 .short 31, 22, 13, 4 endconst +.macro clip10 in1, in2, c1, c2 +smax\in1, \in1, \c1 +smax\in2, \in2, \c1 +smin\in1, \in1, \c2 +smin\in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 +ld1 {v0.8H-v1.8H}, [x1] +ld1 {v2.S}[0], [x0], x2 +ld1 {v2.S}[1], [x0], x2 +ld1 {v2.S}[2], [x0], x2 +ld1 {v2.S}[3], [x0], x2 +sub x0, x0, x2, lsl #2 +uxtlv8.8H, v2.8B +uxtl2 v9.8H, v2.16B +sqadd v0.8H, v0.8H, v8.8H +sqadd v1.8H, v1.8H, v9.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +st1 {v0.S}[0], [x0], x2 +st1 {v0.S}[1], [x0], x2 +st1 {v0.S}[2], [x0], x2 +st1 {v0.S}[3], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_4x4_10_neon, export=1 +mov x12, x0 +ld1 {v0.8H-v1.8H}, [x1] +ld1 {v2.D}[0], [x12], x2 +ld1 {v2.D}[1], [x12], x2 +ld1 {v3.D}[0], [x12], x2 +sqadd v0.8H, v0.8H, v2.8H +ld1 {V3.D}[1], [x12], x2 +moviv4.8H, #0 +sqadd v1.8H, v1.8H, v3.8H +mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF +clip10 v0.8H, v1.8H, v4.8H, v5.8H +st1 {v0.D}[0], [x0], x2 +st1 {v0.D}[1], [x0], x2 +st1 {v1.D}[0], [x0], x2 +st1 {v1.D}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_8x8_8_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +1: subsx3, x3, #2 +ld1 {v2.D}[0], [x0] +ld1 {v2.D}[1], [x12] +uxtlv3.8H, v2.8B +ld1 {v0.8H-v1.8H}, [x1], #32 +uxtl2 v2.8H, v2.16B +sqadd v0.8H, v0.8H, v3.8H +sqadd v1.8H, v1.8H, v2.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +st1 {v0.D}[0], [x0], x2 +st1 {v0.D}[1], [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_8x8_10_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +moviv4.8H, #0 +mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF +1: subsx3, x3, #2 +ld1 {v0.8H-v1.8H}, [x1], #32 +ld1 {v2.8H},[x0] +sqadd v0.8H, v0.8H, v2.8H +ld1 {v3.8H},[x12] +sqadd v1.8H, v1.8H, v3.8H +clip10 v0.8H, v1.8H, v4.8H, v5.8H +st1 {v0.8H}, [x0], x2 +st1 {v1.8H}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_8_neon, export=1 +mov x3, #16 +add x12, x0, x2 +add x2, x2, x2 +1: subsx3, x3, #2 +ld1 {v16.16B}, [x0] +ld1 {v0.8H-v3.8H}, [x1], #64 +ld1 {v19.16B},[x12] +uxtlv17.8H, v16.8B +uxtl2 v18.8H, v16.16B +uxtlv20.8H, v19.8B +uxtl2 v21.8H, v19.16B +sqadd v0.8H, v0.8H, v17.8H +sqadd v1.8H, v1.8H, v18.8H +sqadd v2.8H, v2.8H, v20.8H +sqadd v3.8H, v3.8H, v21.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +sqxtun v1.8B, v2.8H +sqxtun2 v1.16B, v3.8H +st1 {v0.16B}, [x0], x2 +st1 {v1.16B}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_10_neon, export=1 +mov x3, #16 +moviv20.8H, #0 +mvniv21.8H, #0xFC, LSL #8 // movi #0x3FF +add x12, x0, x2 +add x2, x2, x2 +1: subs
[FFmpeg-devel] [PATCH] libavcodec/aarch64/hevcdsp_idct_neon.S: Also port add_residual functions.
From: Reimar Döffinger Speedup is fairly small, around 1.5%, but these are fairly simple. --- libavcodec/aarch64/hevcdsp_idct_neon.S| 190 ++ libavcodec/aarch64/hevcdsp_init_aarch64.c | 24 +++ 2 files changed, 214 insertions(+) diff --git a/libavcodec/aarch64/hevcdsp_idct_neon.S b/libavcodec/aarch64/hevcdsp_idct_neon.S index 9f67e45..edd03a0 100644 --- a/libavcodec/aarch64/hevcdsp_idct_neon.S +++ b/libavcodec/aarch64/hevcdsp_idct_neon.S @@ -36,6 +36,196 @@ const trans, align=4 .short 31, 22, 13, 4 endconst +.macro clip10 in1, in2, c1, c2 +smax\in1, \in1, \c1 +smax\in2, \in2, \c1 +smin\in1, \in1, \c2 +smin\in2, \in2, \c2 +.endm + +function ff_hevc_add_residual_4x4_8_neon, export=1 +ld1 {v0.8H-v1.8H}, [x1] +ld1 {v2.S}[0], [x0], x2 +ld1 {v2.S}[1], [x0], x2 +ld1 {v2.S}[2], [x0], x2 +ld1 {v2.S}[3], [x0], x2 +sub x0, x0, x2, lsl #2 +uxtlv8.8H, v2.8B +uxtl2 v9.8H, v2.16B +sqadd v0.8H, v0.8H, v8.8H +sqadd v1.8H, v1.8H, v9.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +st1 {v0.S}[0], [x0], x2 +st1 {v0.S}[1], [x0], x2 +st1 {v0.S}[2], [x0], x2 +st1 {v0.S}[3], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_4x4_10_neon, export=1 +mov x12, x0 +ld1 {v0.8H-v1.8H}, [x1] +ld1 {v2.D}[0], [x12], x2 +ld1 {v2.D}[1], [x12], x2 +ld1 {v3.D}[0], [x12], x2 +sqadd v0.8H, v0.8H, v2.8H +ld1 {V3.D}[1], [x12], x2 +moviv4.8H, #0 +sqadd v1.8H, v1.8H, v3.8H +mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF +clip10 v0.8H, v1.8H, v4.8H, v5.8H +st1 {v0.D}[0], [x0], x2 +st1 {v0.D}[1], [x0], x2 +st1 {v1.D}[0], [x0], x2 +st1 {v1.D}[1], [x0], x2 +ret +endfunc + +function ff_hevc_add_residual_8x8_8_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +1: subsx3, x3, #2 +ld1 {v2.D}[0], [x0] +ld1 {v2.D}[1], [x12] +uxtlv3.8H, v2.8B +ld1 {v0.8H-v1.8H}, [x1], #32 +uxtl2 v2.8H, v2.16B +sqadd v0.8H, v0.8H, v3.8H +sqadd v1.8H, v1.8H, v2.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +st1 {v0.D}[0], [x0], x2 +st1 {v0.D}[1], [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_8x8_10_neon, export=1 +add x12, x0, x2 +add x2, x2, x2 +mov x3, #8 +moviv4.8H, #0 +mvniv5.8H, #0xFC, LSL #8 // movi #0x3FF +1: subsx3, x3, #2 +ld1 {v0.8H-v1.8H}, [x1], #32 +ld1 {v2.8H},[x0] +sqadd v0.8H, v0.8H, v2.8H +ld1 {v3.8H},[x12] +sqadd v1.8H, v1.8H, v3.8H +clip10 v0.8H, v1.8H, v4.8H, v5.8H +st1 {v0.8H}, [x0], x2 +st1 {v1.8H}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_8_neon, export=1 +mov x3, #16 +add x12, x0, x2 +add x2, x2, x2 +1: subsx3, x3, #2 +ld1 {v16.16B}, [x0] +ld1 {v0.8H-v3.8H}, [x1], #64 +ld1 {v19.16B},[x12] +uxtlv17.8H, v16.8B +uxtl2 v18.8H, v16.16B +uxtlv20.8H, v19.8B +uxtl2 v21.8H, v19.16B +sqadd v0.8H, v0.8H, v17.8H +sqadd v1.8H, v1.8H, v18.8H +sqadd v2.8H, v2.8H, v20.8H +sqadd v3.8H, v3.8H, v21.8H +sqxtun v0.8B, v0.8H +sqxtun2 v0.16B, v1.8H +sqxtun v1.8B, v2.8H +sqxtun2 v1.16B, v3.8H +st1 {v0.16B}, [x0], x2 +st1 {v1.16B}, [x12], x2 +bne 1b +ret +endfunc + +function ff_hevc_add_residual_16x16_10_neon, export=1 +mov x3, #16 +moviv20.8H, #0 +mvniv21.8H, #0xFC, LSL #8 // movi #0x3FF +add x12, x0, x2 +add x2, x2, x2 +1: subs
Re: [FFmpeg-devel] [PATCH] libavfilter/dnn: add batch mode for async execution
> -Original Message- > From: Guo, Yejun > Sent: 2021年1月8日 16:37 > To: ffmpeg-devel@ffmpeg.org > Cc: Guo, Yejun > Subject: [PATCH] libavfilter/dnn: add batch mode for async execution > > the default number of batch_size is 1 > > Signed-off-by: Xie, Lin > Signed-off-by: Wu Zhiwen > Signed-off-by: Guo, Yejun > --- > libavfilter/dnn/dnn_backend_openvino.c | 157 + > libavfilter/dnn/dnn_backend_openvino.h | 1 + > libavfilter/dnn/dnn_interface.c| 1 + > libavfilter/dnn_interface.h| 2 + > libavfilter/vf_dnn_processing.c| 36 +- > 5 files changed, 173 insertions(+), 24 deletions(-) > > diff --git a/libavfilter/dnn/dnn_backend_openvino.c > b/libavfilter/dnn/dnn_backend_openvino.c > index d27e451eea..cb1bc3d22d 100644 > --- a/libavfilter/dnn/dnn_backend_openvino.c > +++ b/libavfilter/dnn/dnn_backend_openvino.c please ignore this patch, it has some issue, will send out V2 later, thanks. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH V2] libavfilter/dnn: add batch mode for async execution
the default number of batch_size is 1 Signed-off-by: Xie, Lin Signed-off-by: Wu Zhiwen Signed-off-by: Guo, Yejun --- libavfilter/dnn/dnn_backend_openvino.c | 187 - libavfilter/dnn/dnn_backend_openvino.h | 1 + libavfilter/dnn/dnn_interface.c| 1 + libavfilter/dnn_interface.h| 2 + libavfilter/vf_dnn_processing.c| 36 - 5 files changed, 194 insertions(+), 33 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index d27e451eea..5271d1caa5 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -37,6 +37,7 @@ typedef struct OVOptions{ char *device_type; int nireq; +int batch_size; } OVOptions; typedef struct OVContext { @@ -70,7 +71,8 @@ typedef struct TaskItem { typedef struct RequestItem { ie_infer_request_t *infer_request; -TaskItem *task; +TaskItem **tasks; +int task_count; ie_complete_call_back_t callback; } RequestItem; @@ -83,6 +85,7 @@ typedef struct RequestItem { static const AVOption dnn_openvino_options[] = { { "device", "device to run model", OFFSET(options.device_type), AV_OPT_TYPE_STRING, { .str = "CPU" }, 0, 0, FLAGS }, { "nireq", "number of request", OFFSET(options.nireq), AV_OPT_TYPE_INT,{ .i64 = 0 }, 0, INT_MAX, FLAGS }, +{ "batch_size", "batch size per request", OFFSET(options.batch_size), AV_OPT_TYPE_INT,{ .i64 = 1 }, 1, 1000, FLAGS}, { NULL } }; @@ -100,7 +103,19 @@ static DNNDataType precision_to_datatype(precision_e precision) } } -static DNNReturnType fill_model_input_ov(OVModel *ov_model, TaskItem *task, RequestItem *request) +static int get_datatype_size(DNNDataType dt) +{ +switch (dt) +{ +case DNN_FLOAT: +return sizeof(float); +default: +av_assert0(!"not supported yet."); +return 1; +} +} + +static DNNReturnType fill_model_input_ov(OVModel *ov_model, RequestItem *request) { dimensions_t dims; precision_e precision; @@ -109,6 +124,7 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, TaskItem *task, Requ IEStatusCode status; DNNData input; ie_blob_t *input_blob = NULL; +TaskItem *task = request->tasks[0]; status = ie_infer_request_get_blob(request->infer_request, task->input_name, &input_blob); if (status != OK) { @@ -134,12 +150,19 @@ static DNNReturnType fill_model_input_ov(OVModel *ov_model, TaskItem *task, Requ input.channels = dims.dims[1]; input.data = blob_buffer.buffer; input.dt = precision_to_datatype(precision); -if (task->do_ioproc) { -if (ov_model->model->pre_proc != NULL) { -ov_model->model->pre_proc(task->in_frame, &input, ov_model->model->filter_ctx); -} else { -proc_from_frame_to_dnn(task->in_frame, &input, ctx); + +av_assert0(request->task_count <= dims.dims[0]); +for (int i = 0; i < request->task_count; ++i) { +task = request->tasks[i]; +if (task->do_ioproc) { +if (ov_model->model->pre_proc != NULL) { +ov_model->model->pre_proc(task->in_frame, &input, ov_model->model->filter_ctx); +} else { +proc_from_frame_to_dnn(task->in_frame, &input, ctx); +} } +input.data = (uint8_t *)input.data + + input.width * input.height * input.channels * get_datatype_size(input.dt); } ie_blob_free(&input_blob); @@ -152,7 +175,7 @@ static void infer_completion_callback(void *args) precision_e precision; IEStatusCode status; RequestItem *request = args; -TaskItem *task = request->task; +TaskItem *task = request->tasks[0]; ie_blob_t *output_blob = NULL; ie_blob_buffer_t blob_buffer; DNNData output; @@ -194,41 +217,56 @@ static void infer_completion_callback(void *args) output.width= dims.dims[3]; output.dt = precision_to_datatype(precision); output.data = blob_buffer.buffer; -if (task->do_ioproc) { -if (task->ov_model->model->post_proc != NULL) { -task->ov_model->model->post_proc(task->out_frame, &output, task->ov_model->model->filter_ctx); + +av_assert0(request->task_count <= dims.dims[0]); +av_assert0(request->task_count >= 1); +for (int i = 0; i < request->task_count; ++i) { +task = request->tasks[i]; +if (task->do_ioproc) { +if (task->ov_model->model->post_proc != NULL) { +task->ov_model->model->post_proc(task->out_frame, &output, task->ov_model->model->filter_ctx); +} else { +proc_from_dnn_to_frame(task->out_frame, &output, ctx); +} } else { -proc_from_dnn_to_frame(task->out_frame, &output, ctx); +task->out_frame->width = output.width; +task->out_frame->height = output.height;
Re: [FFmpeg-devel] [PATCH 2/5] avcodec/fft_template: Remove unused fixed-point cosine tables
On Sun, Jan 10, 2021 at 01:56:21AM +0100, Andreas Rheinhardt wrote: > Michael Niedermayer: > > On Thu, Jan 07, 2021 at 12:13:05AM +0100, Andreas Rheinhardt wrote: > >> There are three types of FFTs: floating-point, 32-bit fixed-point and > >> 16-bit fixed-point. The latter has exactly one user: The fixed-point > >> AC-3-encoder; the cosine tables used by it use up to seven bits. The > >> tables corresponding to eight to seventeen bits are unused, as are the > >> FFT functions for these bits. > >> > >> Therefore this commit removes these tables and functions. This is > >> especially beneficial when using hardcoded tables as they take up > >> moreFirst, > >> than 255 KiB. But even without it one saves said unused functions as > >> well as entries in corresponding tables (this also saves relocations). > >> > >> Signed-off-by: Andreas Rheinhardt > >> --- > >> Thee changes to ARM assembly are honstely untested. I hope someone can > >> test them. Btw: It seems that the ARM assembly code wouldn't be able to > >> deal with an FFT with more than 16 bits (no function for this has been > >> defined), which only worked because no one ever used that many bits with > >> the fixed-point FFT. > >> > >> libavcodec/arm/fft_fixed_neon.S | 18 -- > >> libavcodec/cos_tablegen.c | 4 ++-- > >> libavcodec/fft.h| 4 +++- > >> libavcodec/fft_fixed.c | 1 + > >> libavcodec/fft_template.c | 31 +++ > >> tests/fate/fft.mak | 8 ++-- > >> 6 files changed, 35 insertions(+), 31 deletions(-) > > > > make -j32 libavcodec/tests/fft-fixed && libavcodec/tests/fft-fixed > > Segmentation fault (core dumped) > > > > (if you cant repro say so and ill rebuild with debug symbols ...) > > > > thx > > [...] > > > 1. Lynne has an alternative patchset that makes the only user of > fft_fixed use fft_fixed_32 instead, so this is not important any more. > 2. Are you testing the ARM assembly code (for which I ask for a test) or x86-64 > not? If not, then this surprises me. Did you apply the changes to > fft.mak (some of the tests have been removed as they tested > functionality that was unused (apart from the tests) and has therefore > been removed). i applied the changes from this patchset up to and including the patch and also did a make distclean FFT 512 test Checking... ==18069== Jump to the invalid address stated on the next line ==18069==at 0x0: ??? ==18069==by 0x10F5F9: main (fft.c:529) ==18069== Address 0x0 is not stack'd, malloc'd or (recently) free'd ==18069== ==18069== ==18069== Process terminating with default action of signal 11 (SIGSEGV) ==18069== Bad permissions for mapped region at address 0x0 ==18069==at 0x0: ??? ==18069==by 0x10F5F9: main (fft.c:529) commit 6c532480712d395f5973063adcefce62fc75f2e1 (HEAD) Author: Andreas Rheinhardt Date: Thu Jan 7 00:13:05 2021 +0100 avcodec/fft_template: Remove unused fixed-point cosine tables There are three types of FFTs: floating-point, 32-bit fixed-point and 16-bit fixed-point. The latter has exactly one user: The fixed-point AC-3-encoder; the cosine tables used by it use up to seven bits. The tables corresponding to eight to seventeen bits are unused, as are the FFT functions for these bits. Therefore this commit removes these tables and functions. This is especially beneficial when using hardcoded tables as they take up more than 255 KiB. But even without it one saves said unused functions as well as entries in corresponding tables (this also saves relocations). Signed-off-by: Andreas Rheinhardt Signed-off-by: Michael Niedermayer libavcodec/arm/fft_fixed_neon.S | 18 -- libavcodec/cos_tablegen.c | 4 ++-- libavcodec/fft.h| 4 +++- libavcodec/fft_fixed.c | 1 + libavcodec/fft_template.c | 31 +++ tests/fate/fft.mak | 8 ++-- 6 files changed, 35 insertions(+), 31 deletions(-) commit c592684681700a7d8b41e75a11104f8c1bdd13d9 Author: Andreas Rheinhardt Date: Thu Jan 7 00:13:04 2021 +0100 avcodec/tableprint: Don't include mem_internal.h tableprint.h does not declare anything as aligned; it just prints DECLARE_ALIGNED. So it can be removed; in fact, it needs to be removed, because mem_internal.h includes config.h which leads to warnings when building with hardcoded tables enabled because of redefinitions of CONFIG_HARDCODED_TABLES. (Furthermore, config.h is only valid for the target, not the host, so HAVE_LOCAL_ALIGNED might even be wrong here.) Signed-off-by: Andreas Rheinhardt Signed-off-by: Michael Niedermayer libavcodec/tableprint.h | 1 - 1 file changed, 1 deletion(-) commit 91e1625db15fe8853ceedca9eed14307aaa514c7 (origin/master, origin/HEAD, refs/bisect/good-91e1625db15fe8853ceedca9eed14307aaa514c7) [...] -- Michael GnuPG fin
Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
On Thu, Jan 07, 2021 at 10:39:56AM +0100, Alan Kelly wrote: > Thanks for your patience with this, I have replaced mova with movdqu - movu > generated a compile error on ssse3. What system did this crash on? AMD Ryzen 9 3950X on linux [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.
On Thu, Jan 07, 2021 at 10:41:19AM +0100, Alan Kelly wrote: > --- > Replaces mova with movdqu due to alignment issues > libswscale/x86/Makefile | 1 + > libswscale/x86/swscale.c| 106 +--- > libswscale/x86/yuv2yuvX.asm | 117 > tests/checkasm/sw_scale.c | 98 ++ > 4 files changed, 246 insertions(+), 76 deletions(-) > create mode 100644 libswscale/x86/yuv2yuvX.asm I have one / some ? cases where this changes output ./ffmpeg -i utvideo-yuv422p10le_UQY2_crc32-A431CD5F.avi -bitexact avi.avi i dont know if theres a decoder bug or bug in the patch or something else -rw-r- 1 michael michael 246218 Jan 10 16:23 avi.avi -rw-r- 1 michael michael 245824 Jan 10 16:23 avi-ref.avi file should be at: https://samples.ffmpeg.org/ffmpeg-bugs/trac/ticket4044/ [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB In a rich man's house there is no place to spit but his face. -- Diogenes of Sinope signature.asc Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/4] avformat/rtsp: set AV_OPT_FLAG_DEPRECATED on depracated options
On Thu, 17. Dec 10:42, Andriy Gelman wrote: > On Tue, 08. Dec 22:35, Andriy Gelman wrote: > > Hi Zhao, > > > > Thanks for reviewing. > > > > On Tue, 08. Dec 13:25, "zhilizhao(赵志立)" wrote: > > > > > > > > > > On Dec 8, 2020, at 12:08 PM, Andriy Gelman > > > > wrote: > > > > > > > > On Sun, 15. Nov 13:20, Andriy Gelman wrote: > > > >> From: Andriy Gelman > > > >> > > > >> Signed-off-by: Andriy Gelman > > > >> --- > > > >> libavformat/rtsp.c | 4 ++-- > > > >> 1 file changed, 2 insertions(+), 2 deletions(-) > > > >> > > > >> diff --git a/libavformat/rtsp.c b/libavformat/rtsp.c > > > >> index d9832bbf1f..2ef75f50e3 100644 > > > >> --- a/libavformat/rtsp.c > > > >> +++ b/libavformat/rtsp.c > > > >> @@ -94,7 +94,7 @@ const AVOption ff_rtsp_options[] = { > > > >> { "max_port", "set maximum local UDP port", OFFSET(rtp_port_max), > > > >> AV_OPT_TYPE_INT, {.i64 = RTSP_RTP_PORT_MAX}, 0, 65535, DEC|ENC }, > > > >> { "listen_timeout", "set maximum timeout (in seconds) to wait for > > > >> incoming connections (-1 is infinite, imply flag listen)", > > > >> OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 = -1}, INT_MIN, > > > >> INT_MAX, DEC }, > > > >> #if FF_API_OLD_RTSP_OPTIONS > > > >> -{ "timeout", "set maximum timeout (in seconds) to wait for > > > >> incoming connections (-1 is infinite, imply flag listen) (deprecated, > > > >> use listen_timeout)", OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 > > > >> = -1}, INT_MIN, INT_MAX, DEC }, > > > >> +{ "timeout", "set maximum timeout (in seconds) to wait for > > > >> incoming connections (-1 is infinite, imply flag listen) (deprecated, > > > >> use listen_timeout)", OFFSET(initial_timeout), AV_OPT_TYPE_INT, {.i64 > > > >> = -1}, INT_MIN, INT_MAX, DEC|AV_OPT_FLAG_DEPRECATED }, > > > >> { "stimeout", "set timeout (in microseconds) of socket TCP I/O > > > >> operations", OFFSET(stimeout), AV_OPT_TYPE_INT, {.i64 = 0}, INT_MIN, > > > >> INT_MAX, DEC }, > > > > > > Looks good to me, although it’s a little weird that after major bump > > > “timeout” > > > will have a different meaning instead of being dropped. “stimeout” is > > > deprecated, since there is not other option to replace it at the current > > > time, > > > it cannot be marked as AV_OPT_FLAG_DEPRECATED. > > > > > > > Right, after the major bump timeout will become the suggested alternative > > for > > stimeout, and stimeout will have the deprecated label. > > I think the idea is to get away from timeout option implying the listen > > mode. > > Will apply this patch. > > Ping for patches 2-4. > ping -- Andriy ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux
On Sun, 13. Dec 11:41, Andriy Gelman wrote: > On Sat, 28. Nov 14:46, Andriy Gelman wrote: > > From: Andriy Gelman > > > > As per signal() help (man 2 signal) the semantics of using signal may > > vary across platforms. It is suggested to use sigaction() instead. > > > > On my system, the capture signal is reset to the default handler after > > the first call thus failing to properly handle multiple SIGINTs. > > > > Signed-off-by: Andriy Gelman > > --- > > fftools/ffmpeg.c | 31 +++ > > 1 file changed, 27 insertions(+), 4 deletions(-) > > > > diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c > > index 80f436eab3..01f4ef15d8 100644 > > --- a/fftools/ffmpeg.c > > +++ b/fftools/ffmpeg.c > > @@ -393,8 +393,30 @@ static BOOL WINAPI CtrlHandler(DWORD fdwCtrlType) > > } > > #endif > > > > +#ifdef __linux__ > > +#define SIGNAL(sig, func) \ > > +do {\ > > +action.sa_handler = func; \ > > +sigaction(sig, &action, NULL); \ > > +} while (0) > > +#else > > +#define SIGNAL(sig, func) \ > > +signal(sig, func) > > +#endif > > + > > void term_init(void) > > { > > +#if defined __linux__ > > +struct sigaction action; > > +action.sa_handler = sigterm_handler; > > + > > +/* block other interrupts while processing this one */ > > +sigfillset(&action.sa_mask); > > + > > +/* restart interruptible functions (i.e. don't fail with EINTR) */ > > +action.sa_flags = SA_RESTART; > > +#endif > > + > > #if HAVE_TERMIOS_H > > if (!run_as_daemon && stdin_interaction) { > > struct termios tty; > > @@ -413,14 +435,15 @@ void term_init(void) > > > > tcsetattr (0, TCSANOW, &tty); > > } > > -signal(SIGQUIT, sigterm_handler); /* Quit (POSIX). */ > > +SIGNAL(SIGQUIT, sigterm_handler); /* Quit (POSIX). */ > > } > > #endif > > > > -signal(SIGINT , sigterm_handler); /* Interrupt (ANSI).*/ > > -signal(SIGTERM, sigterm_handler); /* Termination (ANSI). */ > > +SIGNAL(SIGINT, sigterm_handler); > > +SIGNAL(SIGTERM, sigterm_handler); > > + > > #ifdef SIGXCPU > > -signal(SIGXCPU, sigterm_handler); > > +SIGNAL(SIGXCPU, sigterm_handler); > > #endif > > #ifdef SIGPIPE > > signal(SIGPIPE, SIG_IGN); /* Broken pipe (POSIX). */ > > ping > ping -- Andriy ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux
On 29/11/20 5:46 am, Andriy Gelman wrote: void term_init(void) { +#if defined __linux__ +struct sigaction action; Nit: Should this have a "= {0}"? My sigaction(2) says: On some architectures a union is involved: do not assign to both sa_handler and sa_sigaction. so it's possible that sa_sigaction is left uninitialised. If I'm wrong (quite possible, it's 2am), then part 1 lgtm. +action.sa_handler = sigterm_handler; + +/* block other interrupts while processing this one */ +sigfillset(&action.sa_mask); + +/* restart interruptible functions (i.e. don't fail with EINTR) */ +action.sa_flags = SA_RESTART; +#endif ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
From: Reimar Döffinger This requests loops to be vectorized using SIMD instructions. The performance increase is far from hand-optimized assembly but still significant over the plain C version. Typical values are a 2-4x speedup where a hand-written version would achieve 4x-10x. So it is far from a replacement, however some architures will get hand-written assembler quite late or not at all, and this is a good improvement for a trivial amount of work. The cause, besides the compiler being a compiler, is usually that it does not manage to use saturating instructions and thus has to use 32-bit operations where actually saturating 16-bit operations would be sufficient. Other causes are for example the av_clip functions that are not ideal for vectorization (and even as scalar code not optimal for any modern CPU that has either CSEL or MAX/MIN instructions). And of course this only works for relatively simple loops, the IDCT functions for example seemed not possible to optimize that way. Also note that while clang may accept the code and sometimes produces warnings, it does not seem to do anything actually useful at all. Here are example measurements using gcc 10 under Linux (in a VM unfortunately) on AArch64 on Apple M1: Commad: time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads 1 -noframedrop Original code: real0m19.572s user0m23.386s sys 0m0.213s Changing all put_hevc: real0m15.648s user0m19.503s (83.4% of original) sys 0m0.186s In addition changing add_residual: real0m15.424s user0m19.278s (82.4% of original) sys 0m0.133s In addition changing planar copy dither: real0m15.040s user0m18.874s (80.7% of original) sys 0m0.168s Signed-off-by: Reimar Döffinger --- configure | 23 + libavcodec/hevcdsp_template.c | 47 +++ libavutil/internal.h | 6 + libswscale/swscale_unscaled.c | 3 +++ 4 files changed, 79 insertions(+) diff --git a/configure b/configure index 900505756b..73b7c3daeb 100755 --- a/configure +++ b/configure @@ -406,6 +406,7 @@ Toolchain options: --enable-pic build position-independent code --enable-thumb compile for Thumb instruction set --enable-lto use link-time optimization + --enable-openmp-simd use the "omp simd" pragma to optimize code --env="ENV=override" override the environment variables Advanced options (experts only): @@ -2335,6 +2336,7 @@ HAVE_LIST=" opencl_dxva2 opencl_vaapi_beignet opencl_vaapi_intel_media +openmp_simd perl pod2man texi2html @@ -2446,6 +2448,7 @@ CMDLINE_SELECT=" extra_warnings logging lto +openmp_simd optimizations rpath stripping @@ -6926,6 +6929,26 @@ if enabled lto; then disable inline_asm_direct_symbol_refs fi +if enabled openmp_simd; then +ompopt="-fopenmp" +if ! test_cflags $ompopt ; then +test_cflags -Xpreprocessor -fopenmp && ompopt="-Xpreprocessor -fopenmp" +fi +test_cc $ompopt <> shift); src += srcstride; @@ -568,6 +573,7 @@ static void FUNC(put_hevc_pel_uni_w_pixels)(uint8_t *_dst, ptrdiff_t _dststride, ox = ox * (1 << (BIT_DEPTH - 8)); for (y = 0; y < height; y++) { +FF_OMP_SIMD for (x = 0; x < width; x++) dst[x] = av_clip_pixelsrc[x] << (14 - BIT_DEPTH)) * wx + offset) >> shift) + ox); src += srcstride; @@ -592,6 +598,7 @@ static void FUNC(put_hevc_pel_bi_w_pixels)(uint8_t *_dst, ptrdiff_t _dststride, ox0 = ox0 * (1 << (BIT_DEPTH - 8)); ox1 = ox1 * (1 << (BIT_DEPTH - 8)); for (y = 0; y < height; y++) { +FF_OMP_SIMD for (x = 0; x < width; x++) { dst[x] = av_clip_pixel(( (src[x] << (14 - BIT_DEPTH)) * wx1 + src2[x] * wx0 + (ox0 + ox1 + 1) * (1 << log2Wd)) >> (log2Wd + 1)); } @@ -623,6 +630,7 @@ static void FUNC(put_hevc_qpel_h)(int16_t *dst, ptrdiff_t srcstride = _srcstride / sizeof(pixel); const int8_t *filter= ff_hevc_qpel_filters[mx - 1]; for (y = 0; y < height; y++) { +FF_OMP_SIMD for (x = 0; x < width; x++) dst[x] = QPEL_FILTER(src, 1) >> (BIT_DEPTH - 8); src += srcstride; @@ -639,6 +647,7 @@ static void FUNC(put_hevc_qpel_v)(int16_t *dst, ptrdiff_t srcstride = _srcstride / sizeof(pixel); const int8_t *filter= ff_hevc_qpel_filters[my - 1]; for (y = 0; y < height; y++) { +FF_OMP_SIMD for (x = 0; x < width; x++) dst[x] = QPEL_FILTER(src, srcstride) >> (BIT_DEPTH - 8); src += srcstride; @@ -662,6 +671,7 @@ static void FUNC(put_hevc_qpel_hv)(int16_t *dst, src -= QPEL_EXTRA_BEFORE * srcstride; filter = ff_hevc_qpel_filters[mx - 1]; for (y = 0; y < height + QPEL_EXTRA; y++) { +FF_OMP_SIMD for (x = 0; x < width; x++) tmp[x] = QPEL_FILTER(src
Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de: > From: Reimar Döffinger > > This requests loops to be vectorized using SIMD > instructions. > The performance increase is far from hand-optimized > assembly but still significant over the plain C version. > Typical values are a 2-4x speedup where a hand-written > version would achieve 4x-10x. > So it is far from a replacement, however some architures > will get hand-written assembler quite late or not at all, > and this is a good improvement for a trivial amount of work. > The cause, besides the compiler being a compiler, is > usually that it does not manage to use saturating instructions > and thus has to use 32-bit operations where actually > saturating 16-bit operations would be sufficient. > Other causes are for example the av_clip functions that > are not ideal for vectorization (and even as scalar code > not optimal for any modern CPU that has either CSEL or > MAX/MIN instructions). > And of course this only works for relatively simple > loops, the IDCT functions for example seemed not possible > to optimize that way. > Also note that while clang may accept the code and sometimes > produces warnings, it does not seem to do anything actually > useful at all. > Here are example measurements using gcc 10 under Linux (in a VM unfortunately) > on AArch64 on Apple M1: > Commad: > time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads 1 > -noframedrop > > Original code: > real0m19.572s > user0m23.386s > sys 0m0.213s > > Changing all put_hevc: > real0m15.648s > user0m19.503s (83.4% of original) > sys 0m0.186s > > In addition changing add_residual: > real0m15.424s > user0m19.278s (82.4% of original) > sys 0m0.133s > > In addition changing planar copy dither: > real0m15.040s > user0m18.874s (80.7% of original) > sys 0m0.168s > I think I have to disagree. The performance gains are marginal, its definitely something the compiler should be able to decide on its own, and it makes performance highly compiler dependent. And I'm not even resorting to the painfully obvious FUD arguments that could be made. Most of the loops this is added to are trivially SIMDable. Just because no one has had the motivation to do SIMD for a pretty unpopular codec doesn't mean we should compromise. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 1/2] ffmpeg: use sigaction() instead of signal() on linux
On Sun, 10. Jan 16:32, Zane van Iperen wrote: > On 29/11/20 5:46 am, Andriy Gelman wrote: > > > void term_init(void) > > { > > +#if defined __linux__ > > +struct sigaction action; Hi Zane, Thanks for reviewing the patch. > > Nit: Should this have a "= {0}"? > > My sigaction(2) says: > On some architectures a union is involved: do not assign to both sa_handler > and sa_sigaction. > so it's possible that sa_sigaction is left uninitialised. > > If I'm wrong (quite possible, it's 2am), then part 1 lgtm. > SA_SIGINFO in sa_flags used to decide whether sa_handler or sa_sigaction is chosen from the union. But, there is one function pointer sa_restorer in struct sigaction that's currently not initialized. The docs say this pointer is used internally by glibc/kernel, and should not be used by applications. It doesn't say that it needs to be set to NULL, but I suppose it's a good practise. I'll add your suggestion to the patch. Will apply the patch in a few days unless there are other comments. -- Andriy ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v5] avformat/udp: return the error code instead of generic EIO
On Sun, 10 Jan 2021, lance.lmw...@gmail.com wrote: From: Limin Wang Signed-off-by: Limin Wang --- libavformat/udp.c | 55 +-- 1 file changed, 33 insertions(+), 22 deletions(-) [...] @@ -888,8 +901,6 @@ static int udp_open(URLContext *h, const char *uri, int flags) } if ((!is_output && s->circular_buffer_size) || (is_output && s->bitrate && s->circular_buffer_size)) { -int ret; - /* start the task going */ s->fifo = av_fifo_alloc(s->circular_buffer_size); ret = pthread_mutex_init(&s->mutex, NULL); This ret (and some others later in the code) are not an AVERROR(), you should convert it to AVERROR() before returning. Regards, Marton ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] FFmpeg buying an Apple M1 Mac Mini
> > > I will buy these if nobody objects by the end of the week. > > Totally in favor, also I'd like us to have two machines running. > > Thanks for hosting! > > -Thilo > Hi, Just to confirm, I have bought the two Mac Minis. They should be delivered and racked up in early February (lockdown may delay this). Regards, Kieran ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] Buying and hosting a HiFive RISC-V system
Hello, Lynne has suggested on IRC that we purchase one or more of these: https://www.sifive.com/boards/hifive-unmatched I think this is an interesting idea as RISC-V is an important platform for the future (like M1). I'll likely have to buy from Mouser (as I'm not sure SPI will accept CrowdSupply) and there is a long lead-time for it: https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA== Also, I'll have to claim for a case and M.2 SSD. I am happy to host this like with the Apple M1. Regards, Kieran Kunhya ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] Buying and hosting a HiFive RISC-V system
In my evaluate, the RISC-V code density is 60% compare to ARM, with C-Extension, it raise to 80% it may be a big problem play large ffmpeg on really products, but we have more space to improve ffmpeg on it. At 2021-01-11 04:21:07, "Kieran Kunhya" wrote: >Hello, > >Lynne has suggested on IRC that we purchase one or more of these: >https://www.sifive.com/boards/hifive-unmatched > >I think this is an interesting idea as RISC-V is an important platform for >the future (like M1). >I'll likely have to buy from Mouser (as I'm not sure SPI will accept >CrowdSupply) and there is a long lead-time for it: >https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA== > >Also, I'll have to claim for a case and M.2 SSD. > >I am happy to host this like with the Apple M1. > >Regards, >Kieran Kunhya >___ >ffmpeg-devel mailing list >ffmpeg-devel@ffmpeg.org >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >To unsubscribe, visit link above, or email >ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] Buying and hosting a HiFive RISC-V system
Keep in mind though that the RISC-V Vector Extensions (which btw look really smart and promising) are not implemented in the SiFive Unmatched Chip yet (iIrc). But one has to start somewhere and some future embedded devices like will also lack those. On 2021-01-10 21:39, chen wrote: In my evaluate, the RISC-V code density is 60% compare to ARM, with C-Extension, it raise to 80% it may be a big problem play large ffmpeg on really products, but we have more space to improve ffmpeg on it. At 2021-01-11 04:21:07, "Kieran Kunhya" wrote: Hello, Lynne has suggested on IRC that we purchase one or more of these: https://www.sifive.com/boards/hifive-unmatched I think this is an interesting idea as RISC-V is an important platform for the future (like M1). I'll likely have to buy from Mouser (as I'm not sure SPI will accept CrowdSupply) and there is a long lead-time for it: https://www.mouser.co.uk/ProductDetail/SiFive/HF105-000?qs=zW32dvEIR3vHEV%2FPYYkdMA== Also, I'll have to claim for a case and M.2 SSD. I am happy to host this like with the Apple M1. Regards, Kieran Kunhya ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH 2/5] avcodec/fft_template: Remove unused fixed-point cosine tables
Michael Niedermayer: > On Sun, Jan 10, 2021 at 01:56:21AM +0100, Andreas Rheinhardt wrote: >> Michael Niedermayer: >>> On Thu, Jan 07, 2021 at 12:13:05AM +0100, Andreas Rheinhardt wrote: There are three types of FFTs: floating-point, 32-bit fixed-point and 16-bit fixed-point. The latter has exactly one user: The fixed-point AC-3-encoder; the cosine tables used by it use up to seven bits. The tables corresponding to eight to seventeen bits are unused, as are the FFT functions for these bits. Therefore this commit removes these tables and functions. This is especially beneficial when using hardcoded tables as they take up moreFirst, than 255 KiB. But even without it one saves said unused functions as well as entries in corresponding tables (this also saves relocations). Signed-off-by: Andreas Rheinhardt --- Thee changes to ARM assembly are honstely untested. I hope someone can test them. Btw: It seems that the ARM assembly code wouldn't be able to deal with an FFT with more than 16 bits (no function for this has been defined), which only worked because no one ever used that many bits with the fixed-point FFT. libavcodec/arm/fft_fixed_neon.S | 18 -- libavcodec/cos_tablegen.c | 4 ++-- libavcodec/fft.h| 4 +++- libavcodec/fft_fixed.c | 1 + libavcodec/fft_template.c | 31 +++ tests/fate/fft.mak | 8 ++-- 6 files changed, 35 insertions(+), 31 deletions(-) >>> >>> make -j32 libavcodec/tests/fft-fixed && libavcodec/tests/fft-fixed >>> Segmentation fault (core dumped) >>> >>> (if you cant repro say so and ill rebuild with debug symbols ...) >>> >>> thx >>> [...] >>> >> 1. Lynne has an alternative patchset that makes the only user of >> fft_fixed use fft_fixed_32 instead, so this is not important any more. > >> 2. Are you testing the ARM assembly code (for which I ask for a test) or > > x86-64 > >> not? If not, then this surprises me. Did you apply the changes to >> fft.mak (some of the tests have been removed as they tested >> functionality that was unused (apart from the tests) and has therefore >> been removed). > > i applied the changes from this patchset up to and including the patch > and also did a make distclean > FFT 512 test Now I see what's the problem. You are not running the fft-tests from the FATE-suite (where I disabled unused and newly unsupported tests); instead you are directly using the underlying tests in libavcodec/tests and these tests default to nbits = 9, which is unsupported for fft-fixed with this patch applied. ff_fft_init_fixed is correctly erroring out, yet said error code is ignored in the test's fft_init (therefore testing an unsupported nbits already leads to segfaults on master (try libavcodec/tests/fft -n 20)). So the error checking in the tests needs to be improved (there are other unchecked allocations, too). Furthermore, if this patch were to be applied, one should set the default number of bits for fft-fixed to something supported; but that point is moot given that this patch has been superseded by Lynne's (who wants to nuke fft-fixed). - Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avcodec/xbmenc: Better xbm memory use
Small memory reduction which uses approx 6/7th total memory. Assuming \n is 2bytes, we first need {32+33+40+5}=110 but we also need to include the terminating zero => 110+1 = 111 (bug-fix). Then assuming \n is 2bytes, data requires => height * (linesize * 6 + 2) For example, " 0x00, 0x11, 0x22,\n" From 81436261e6de8ddaf1ce3c6f010ab2c018f92eb8 Mon Sep 17 00:00:00 2001 From: Joe Da Silva Date: Sun, 10 Jan 2021 01:35:05 -0800 Subject: [PATCH] avcodec/xbmenc: Better xbm memory use Small memory reduction which uses approx 6/7th total memory. Assuming \n is 2bytes, we first need {32+33+40+5}=110 but we also need to include the terminating zero => 110+1 = 111 Assuming \n is 2bytes, data requires => height * (linesize * 6 + 2) For example, " 0x00, 0x11, 0x22,\n" Signed-off-by: Joe Da Silva --- libavcodec/xbmenc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libavcodec/xbmenc.c b/libavcodec/xbmenc.c index b25615f2a4..9222947893 100644 --- a/libavcodec/xbmenc.c +++ b/libavcodec/xbmenc.c @@ -31,7 +31,7 @@ static int xbm_encode_frame(AVCodecContext *avctx, AVPacket *pkt, uint8_t *ptr, *buf; linesize = (avctx->width + 7) / 8; -size = avctx->height * (linesize * 7 + 2) + 110; +size = avctx->height * (linesize * 6 + 2) + 111; if ((ret = ff_alloc_packet2(avctx, pkt, size, 0)) < 0) return ret; -- 2.30.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] avcodec/cbs: constify decompose_unit_types
CBS doesn't change its contents in any way whatsoever internally, and most users already set it to a const array. Signed-off-by: James Almer --- libavcodec/av1_frame_split_bsf.c | 2 +- libavcodec/av1_parser.c | 2 +- libavcodec/cbs.h | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/libavcodec/av1_frame_split_bsf.c b/libavcodec/av1_frame_split_bsf.c index 13bebe19f5..fa8b887b6c 100644 --- a/libavcodec/av1_frame_split_bsf.c +++ b/libavcodec/av1_frame_split_bsf.c @@ -214,7 +214,7 @@ static int av1_frame_split_init(AVBSFContext *ctx) if (ret < 0) return ret; -s->cbc->decompose_unit_types= (CodedBitstreamUnitType*)decompose_unit_types; +s->cbc->decompose_unit_types= decompose_unit_types; s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types); if (!ctx->par_in->extradata_size) diff --git a/libavcodec/av1_parser.c b/libavcodec/av1_parser.c index 181ff3a1be..6a76ffb7bc 100644 --- a/libavcodec/av1_parser.c +++ b/libavcodec/av1_parser.c @@ -191,7 +191,7 @@ static av_cold int av1_parser_init(AVCodecParserContext *ctx) if (ret < 0) return ret; -s->cbc->decompose_unit_types= (CodedBitstreamUnitType *)decompose_unit_types; +s->cbc->decompose_unit_types= decompose_unit_types; s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types); return 0; diff --git a/libavcodec/cbs.h b/libavcodec/cbs.h index 3fd0a0ef33..f022282b75 100644 --- a/libavcodec/cbs.h +++ b/libavcodec/cbs.h @@ -196,7 +196,7 @@ typedef struct CodedBitstreamContext { * Types not in this list will be available in bitstream form only. * If NULL, all supported types will be decomposed. */ -CodedBitstreamUnitType *decompose_unit_types; +const CodedBitstreamUnitType *decompose_unit_types; /** * Length of the decompose_unit_types array. */ -- 2.30.0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] avcodec/cbs: constify decompose_unit_types
James Almer: > CBS doesn't change its contents in any way whatsoever internally, and most > users already set it to a const array. > > Signed-off-by: James Almer > --- > libavcodec/av1_frame_split_bsf.c | 2 +- > libavcodec/av1_parser.c | 2 +- > libavcodec/cbs.h | 2 +- > 3 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/libavcodec/av1_frame_split_bsf.c > b/libavcodec/av1_frame_split_bsf.c > index 13bebe19f5..fa8b887b6c 100644 > --- a/libavcodec/av1_frame_split_bsf.c > +++ b/libavcodec/av1_frame_split_bsf.c > @@ -214,7 +214,7 @@ static int av1_frame_split_init(AVBSFContext *ctx) > if (ret < 0) > return ret; > > -s->cbc->decompose_unit_types= > (CodedBitstreamUnitType*)decompose_unit_types; > +s->cbc->decompose_unit_types= decompose_unit_types; > s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types); > > if (!ctx->par_in->extradata_size) > diff --git a/libavcodec/av1_parser.c b/libavcodec/av1_parser.c > index 181ff3a1be..6a76ffb7bc 100644 > --- a/libavcodec/av1_parser.c > +++ b/libavcodec/av1_parser.c > @@ -191,7 +191,7 @@ static av_cold int av1_parser_init(AVCodecParserContext > *ctx) > if (ret < 0) > return ret; > > -s->cbc->decompose_unit_types= (CodedBitstreamUnitType > *)decompose_unit_types; > +s->cbc->decompose_unit_types= decompose_unit_types; > s->cbc->nb_decompose_unit_types = FF_ARRAY_ELEMS(decompose_unit_types); > > return 0; > diff --git a/libavcodec/cbs.h b/libavcodec/cbs.h > index 3fd0a0ef33..f022282b75 100644 > --- a/libavcodec/cbs.h > +++ b/libavcodec/cbs.h > @@ -196,7 +196,7 @@ typedef struct CodedBitstreamContext { > * Types not in this list will be available in bitstream form only. > * If NULL, all supported types will be decomposed. > */ > -CodedBitstreamUnitType *decompose_unit_types; > +const CodedBitstreamUnitType *decompose_unit_types; > /** > * Length of the decompose_unit_types array. > */ > LGTM. - Andreas ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH] libswscale/aarch64/hscale.S: Support more bit-depth variants.
From: Reimar Döffinger Trivially expand hscale assembler to support > 8 bit formats both for input and output. 16-bit input is not supported as I am not certain how to get sufficient test coverage. --- libswscale/aarch64/hscale.S | 53 ++-- libswscale/aarch64/swscale.c | 49 +++-- 2 files changed, 85 insertions(+), 17 deletions(-) diff --git a/libswscale/aarch64/hscale.S b/libswscale/aarch64/hscale.S index af55ffe2b7..3b42d39dac 100644 --- a/libswscale/aarch64/hscale.S +++ b/libswscale/aarch64/hscale.S @@ -20,7 +20,11 @@ #include "libavutil/aarch64/asm.S" -function ff_hscale_8_to_15_neon, export=1 +.macro hscale srcbits, dstbits, ldt, lds, c +function ff_hscale_\srcbits\()_to_\dstbits\()_neon, export=1 +.if \dstbits >= 16 +moviv20.4S, #(0x1 << (\dstbits - 16)), msl #16 +.endif sbfiz x7, x6, #1, #32 // filterSize*2 (*2 because int16) 1: ldr w8, [x5], #4// filterPos[idx] ldr w0, [x5], #4// filterPos[idx + 1] @@ -34,30 +38,30 @@ function ff_hscale_8_to_15_neon, export=1 moviv1.2D, #0 // val sum part 2 (for dst[1]) moviv2.2D, #0 // val sum part 3 (for dst[2]) moviv3.2D, #0 // val sum part 4 (for dst[3]) -add x17, x3, w8, UXTW // srcp + filterPos[0] -add x8, x3, w0, UXTW // srcp + filterPos[1] -add x0, x3, w11, UXTW // srcp + filterPos[2] -add x11, x3, w9, UXTW // srcp + filterPos[3] +add x17, x3, w8, UXTW #!!(\srcbits > 8) // srcp + filterPos[0] +add x8, x3, w0, UXTW #!!(\srcbits > 8) // srcp + filterPos[1] +add x0, x3, w11, UXTW #!!(\srcbits > 8) // srcp + filterPos[2] +add x11, x3, w9, UXTW #!!(\srcbits > 8) // srcp + filterPos[3] mov w15, w6 // filterSize counter -2: ld1 {v4.8B}, [x17], #8 // srcp[filterPos[0] + {0..7}] +2: ld1 {v4.\ldt}, [x17], \lds // srcp[filterPos[0] + {0..7}] ld1 {v5.8H}, [x16], #16 // load 8x16-bit filter values, part 1 -ld1 {v6.8B}, [x8], #8 // srcp[filterPos[1] + {0..7}] +ld1 {v6.\ldt}, [x8], \lds // srcp[filterPos[1] + {0..7}] ld1 {v7.8H}, [x12], #16 // load 8x16-bit at filter+filterSize -uxtlv4.8H, v4.8B// unpack part 1 to 16-bit +\c\cuxtlv4.8H, v4.8B// unpack part 1 to 16-bit smlal v0.4S, v4.4H, v5.4H // v0 accumulates srcp[filterPos[0] + {0..3}] * filter[{0..3}] smlal2 v0.4S, v4.8H, v5.8H // v0 accumulates srcp[filterPos[0] + {4..7}] * filter[{4..7}] -ld1 {v16.8B}, [x0], #8 // srcp[filterPos[2] + {0..7}] +ld1 {v16.\ldt}, [x0], \lds // srcp[filterPos[2] + {0..7}] ld1 {v17.8H}, [x13], #16// load 8x16-bit at filter+2*filterSize -uxtlv6.8H, v6.8B// unpack part 2 to 16-bit +\c\cuxtlv6.8H, v6.8B// unpack part 2 to 16-bit smlal v1.4S, v6.4H, v7.4H // v1 accumulates srcp[filterPos[1] + {0..3}] * filter[{0..3}] -uxtlv16.8H, v16.8B // unpack part 3 to 16-bit +\c\cuxtlv16.8H, v16.8B // unpack part 3 to 16-bit smlal v2.4S, v16.4H, v17.4H // v2 accumulates srcp[filterPos[2] + {0..3}] * filter[{0..3}] smlal2 v2.4S, v16.8H, v17.8H // v2 accumulates srcp[filterPos[2] + {4..7}] * filter[{4..7}] -ld1 {v18.8B}, [x11], #8 // srcp[filterPos[3] + {0..7}] +ld1 {v18.\ldt}, [x11], \lds // srcp[filterPos[3] + {0..7}] smlal2 v1.4S, v6.8H, v7.8H // v1 accumulates srcp[filterPos[1] + {4..7}] * filter[{4..7}] ld1 {v19.8H}, [x4], #16 // load 8x16-bit at filter+3*filterSize subsw15, w15, #8// j -= 8: processed 8/filterSize -uxtlv18.8H, v18.8B // unpack part 4 to 16-bit +\c\cuxtlv18.8H, v18.8B // unpack part 4 to 16-bit smlal v3.4S, v18.4H, v19.4H // v3 accumulates srcp[filterPos[3] + {0..3}] * filter[{0..3}] smlal2 v3.4S, v18.8H, v19.8H // v3 acc
Re: [FFmpeg-devel] [PATCH v5] avformat/udp: return the error code instead of generic EIO
On Sun, Jan 10, 2021 at 08:29:30PM +0100, Marton Balint wrote: > > > On Sun, 10 Jan 2021, lance.lmw...@gmail.com wrote: > > > From: Limin Wang > > > > Signed-off-by: Limin Wang > > --- > > libavformat/udp.c | 55 > > +-- > > 1 file changed, 33 insertions(+), 22 deletions(-) > > [...] > > > > @@ -888,8 +901,6 @@ static int udp_open(URLContext *h, const char *uri, int > > flags) > > } > > > > if ((!is_output && s->circular_buffer_size) || (is_output && s->bitrate > > && s->circular_buffer_size)) { > > -int ret; > > - > > /* start the task going */ > > s->fifo = av_fifo_alloc(s->circular_buffer_size); > > ret = pthread_mutex_init(&s->mutex, NULL); > > This ret (and some others later in the code) are not an AVERROR(), you > should convert it to AVERROR() before returning. OK, will add below code before goto error for these condition, correct? ret = AVERROR(ret); > > Regards, > Marton > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". -- Thanks, Limin Wang ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH] Add support for "omp simd" pragma.
Am So., 10. Jan. 2021 um 19:55 Uhr schrieb Lynne : > > Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de: > > > From: Reimar Döffinger > > > > This requests loops to be vectorized using SIMD > > instructions. > > The performance increase is far from hand-optimized > > assembly but still significant over the plain C version. > > Typical values are a 2-4x speedup where a hand-written > > version would achieve 4x-10x. > > So it is far from a replacement, however some architures > > will get hand-written assembler quite late or not at all, > > and this is a good improvement for a trivial amount of work. > > The cause, besides the compiler being a compiler, is > > usually that it does not manage to use saturating instructions > > and thus has to use 32-bit operations where actually > > saturating 16-bit operations would be sufficient. > > Other causes are for example the av_clip functions that > > are not ideal for vectorization (and even as scalar code > > not optimal for any modern CPU that has either CSEL or > > MAX/MIN instructions). > > And of course this only works for relatively simple > > loops, the IDCT functions for example seemed not possible > > to optimize that way. > > Also note that while clang may accept the code and sometimes > > produces warnings, it does not seem to do anything actually > > useful at all. > > Here are example measurements using gcc 10 under Linux (in a VM > > unfortunately) > > on AArch64 on Apple M1: > > Commad: > > time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads > > 1 -noframedrop > > > > Original code: > > real0m19.572s > > user0m23.386s > > sys 0m0.213s > > > > Changing all put_hevc: > > real0m15.648s > > user0m19.503s (83.4% of original) > > sys 0m0.186s > > > > In addition changing add_residual: > > real0m15.424s > > user0m19.278s (82.4% of original) > > sys 0m0.133s > > > > In addition changing planar copy dither: > > real0m15.040s > > user0m18.874s (80.7% of original) > > sys 0m0.168s > > > > I think I have to disagree. > The performance gains are marginal This sounds wrong. Carl Eugen ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 1/3] dnn/openvino: remove unnecessary code
Signed-off-by: Ting Fu --- libavfilter/dnn/dnn_backend_openvino.c | 8 1 file changed, 8 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index d27e451eea..050be97209 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -284,14 +284,6 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input return DNN_ERROR; } -// The order of dims in the openvino is fixed and it is always NCHW for 4-D data. -// while we pass NHWC data from FFmpeg to openvino -status = ie_network_set_input_layout(ov_model->network, input_name, NHWC); -if (status != OK) { -av_log(ctx, AV_LOG_ERROR, "Input \"%s\" does not match layout NHWC\n", input_name); -return DNN_ERROR; -} - input->channels = dims.dims[1]; input->height = dims.dims[2]; input->width= dims.dims[3]; -- 2.17.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH 2/3] dnn/openvino: refine code for better model initialization
Move openvino model/inference request creation and initialization steps from ff_dnn_load_model_ov to new function init_model_ov, for later input resize support. Signed-off-by: Ting Fu --- libavfilter/dnn/dnn_backend_openvino.c | 153 +++-- 1 file changed, 93 insertions(+), 60 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index 050be97209..d6e0593a0b 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -217,6 +217,78 @@ static void infer_completion_callback(void *args) task->done = 1; } +static DNNReturnType init_model_ov(OVModel *ov_model) +{ +OVContext *ctx = &ov_model->ctx; +IEStatusCode status; +ie_available_devices_t a_dev; +ie_config_t config = {NULL, NULL, NULL}; +char *all_dev_names = NULL; + +status = ie_core_load_network(ov_model->core, ov_model->network, ctx->options.device_type, &config, &ov_model->exe_network); +if (status != OK) { +av_log(ctx, AV_LOG_ERROR, "Failed to load OpenVINO model network\n"); +status = ie_core_get_available_devices(ov_model->core, &a_dev); +if (status != OK) { +av_log(ctx, AV_LOG_ERROR, "Failed to get available devices\n"); +goto err; +} +for (int i = 0; i < a_dev.num_devices; i++) { +APPEND_STRING(all_dev_names, a_dev.devices[i]) +} +av_log(ctx, AV_LOG_ERROR,"device %s may not be supported, all available devices are: \"%s\"\n", + ctx->options.device_type, all_dev_names); +goto err; +} + +// create infer_request for sync execution +status = ie_exec_network_create_infer_request(ov_model->exe_network, &ov_model->infer_request); +if (status != OK) +goto err; + +// create infer_requests for async execution +if (ctx->options.nireq <= 0) { +// the default value is a rough estimation +ctx->options.nireq = av_cpu_count() / 2 + 1; +} + +ov_model->request_queue = ff_safe_queue_create(); +if (!ov_model->request_queue) { +goto err; +} + +for (int i = 0; i < ctx->options.nireq; i++) { +ie_infer_request_t *request; +RequestItem *item = av_mallocz(sizeof(*item)); +if (!item) { +goto err; +} +status = ie_exec_network_create_infer_request(ov_model->exe_network, &request); +if (status != OK) { +av_freep(&item); +goto err; +} +item->infer_request = request; +item->callback.completeCallBackFunc = infer_completion_callback; +item->callback.args = item; +if (ff_safe_queue_push_back(ov_model->request_queue, item) < 0) { +av_freep(&item); +goto err; +} +} + +ov_model->task_queue = ff_queue_create(); +if (!ov_model->task_queue) { +goto err; +} + +return DNN_SUCCESS; + +err: +ff_dnn_free_model_ov(&ov_model->model); +return DNN_ERROR; +} + static DNNReturnType execute_model_ov(TaskItem *task, RequestItem *request) { IEStatusCode status; @@ -325,6 +397,13 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu in_frame->width = input_width; in_frame->height = input_height; +if (!ov_model->exe_network) { +if (init_model_ov(ov_model) != DNN_SUCCESS) { +av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n"); +return DNN_ERROR; +}; +} + task.done = 0; task.do_ioproc = 0; task.async = 0; @@ -347,13 +426,10 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu DNNModel *ff_dnn_load_model_ov(const char *model_filename, const char *options, AVFilterContext *filter_ctx) { -char *all_dev_names = NULL; DNNModel *model = NULL; OVModel *ov_model = NULL; OVContext *ctx = NULL; IEStatusCode status; -ie_config_t config = {NULL, NULL, NULL}; -ie_available_devices_t a_dev; model = av_mallocz(sizeof(DNNModel)); if (!model){ @@ -385,63 +461,6 @@ DNNModel *ff_dnn_load_model_ov(const char *model_filename, const char *options, if (status != OK) goto err; -status = ie_core_load_network(ov_model->core, ov_model->network, ctx->options.device_type, &config, &ov_model->exe_network); -if (status != OK) { -av_log(ctx, AV_LOG_ERROR, "Failed to init OpenVINO model\n"); -status = ie_core_get_available_devices(ov_model->core, &a_dev); -if (status != OK) { -av_log(ctx, AV_LOG_ERROR, "Failed to get available devices\n"); -goto err; -} -for (int i = 0; i < a_dev.num_devices; i++) { -APPEND_STRING(all_dev_names, a_dev.devices[i]) -} -av_log(ctx, AV_LOG_ERROR,"device %s may not be supported, all available devices are: \"%s\"\n", - ctx->
[FFmpeg-devel] [PATCH 3/3] dnn/openvino: support model input resize
OpenVINO APIs require specify input size to run the model, while some OpenVINO model does accept different input size. To enable this feature adding input_resizable option here for easier use. Setting bool variable input_resizable to specify if the input can be resizable or not. input_resizable = 1 means support input resize, aka accept different input size. input_resizable = 0 (default) means do not support input resize. Please make sure the inference model does accept different input size before use this option, otherwise the inference engine may report error(s). eg: ./ffmpeg -i video_name.mp4 -vf dnn_processing=dnn_backend=openvino:\ model=model_name.xml:input=input_name:output=output_name:\ options=device=CPU\&input_resizable=1 -y output_video_name.mp4 Signed-off-by: Ting Fu --- libavfilter/dnn/dnn_backend_openvino.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/libavfilter/dnn/dnn_backend_openvino.c b/libavfilter/dnn/dnn_backend_openvino.c index d6e0593a0b..65d74702ff 100644 --- a/libavfilter/dnn/dnn_backend_openvino.c +++ b/libavfilter/dnn/dnn_backend_openvino.c @@ -37,6 +37,7 @@ typedef struct OVOptions{ char *device_type; int nireq; +int input_resizable; } OVOptions; typedef struct OVContext { @@ -83,6 +84,7 @@ typedef struct RequestItem { static const AVOption dnn_openvino_options[] = { { "device", "device to run model", OFFSET(options.device_type), AV_OPT_TYPE_STRING, { .str = "CPU" }, 0, 0, FLAGS }, { "nireq", "number of request", OFFSET(options.nireq), AV_OPT_TYPE_INT,{ .i64 = 0 }, 0, INT_MAX, FLAGS }, +{ "input_resizable", "can input be resizable or not", OFFSET(options.input_resizable), AV_OPT_TYPE_BOOL, { .i64 = 0 }, 0, 1, FLAGS }, { NULL } }; @@ -334,6 +336,7 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input size_t model_input_count = 0; dimensions_t dims; precision_e precision; +int input_resizable = ctx->options.input_resizable; status = ie_network_get_inputs_number(ov_model->network, &model_input_count); if (status != OK) { @@ -357,8 +360,8 @@ static DNNReturnType get_input_ov(void *model, DNNData *input, const char *input } input->channels = dims.dims[1]; -input->height = dims.dims[2]; -input->width= dims.dims[3]; +input->height = input_resizable ? -1 : dims.dims[2]; +input->width= input_resizable ? -1 : dims.dims[3]; input->dt = precision_to_datatype(precision); return DNN_SUCCESS; } else { @@ -383,6 +386,8 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu RequestItem request; AVFrame *in_frame = av_frame_alloc(); AVFrame *out_frame = NULL; +IEStatusCode status; +input_shapes_t input_shapes; if (!in_frame) { av_log(ctx, AV_LOG_ERROR, "Failed to allocate memory for input frame\n"); @@ -397,6 +402,18 @@ static DNNReturnType get_output_ov(void *model, const char *input_name, int inpu in_frame->width = input_width; in_frame->height = input_height; +if (ctx->options.input_resizable) { +status = ie_network_get_input_shapes(ov_model->network, &input_shapes); +input_shapes.shapes->shape.dims[2] = input_height; +input_shapes.shapes->shape.dims[3] = input_width; +status |= ie_network_reshape(ov_model->network, input_shapes); +ie_network_input_shapes_free(&input_shapes); +if (status != OK) { +av_log(ctx, AV_LOG_ERROR, "Failed to reshape input size for %s\n", input_name); +return DNN_ERROR; +} +} + if (!ov_model->exe_network) { if (init_model_ov(ov_model) != DNN_SUCCESS) { av_log(ctx, AV_LOG_ERROR, "Failed init OpenVINO exectuable network or inference request\n"); -- 2.17.1 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".