On 30/07/18 12:42, Jun Zhao wrote: > Enable multi-slice support in AVC/H.264 vaapi encoder. > > Signed-off-by: Wang, Yi A <yi.a.w...@intel.com> > Signed-off-by: Jun Zhao <jun.z...@intel.com> > --- > libavcodec/vaapi_encode_h264.c | 31 +++++++++++++++++++++++++------ > 1 files changed, 25 insertions(+), 6 deletions(-) > > diff --git a/libavcodec/vaapi_encode_h264.c b/libavcodec/vaapi_encode_h264.c > index 905c507..70c89e8 100644 > --- a/libavcodec/vaapi_encode_h264.c > +++ b/libavcodec/vaapi_encode_h264.c > @@ -581,6 +581,7 @@ static int > vaapi_encode_h264_init_picture_params(AVCodecContext *avctx, > H264RawSPS *sps = &priv->sps; > VAEncPictureParameterBufferH264 *vpic = pic->codec_picture_params; > int i; > + int slices; > > memset(&priv->current_access_unit, 0, > sizeof(priv->current_access_unit)); > @@ -690,7 +691,17 @@ static int > vaapi_encode_h264_init_picture_params(AVCodecContext *avctx, > vpic->pic_fields.bits.idr_pic_flag = (pic->type == > PICTURE_TYPE_IDR); > vpic->pic_fields.bits.reference_pic_flag = (pic->type != PICTURE_TYPE_B); > > - pic->nb_slices = 1; > + slices = 1; > + if (ctx->max_slices) { > + if (avctx->slices <= FFMIN(ctx->max_slices, priv->mb_height)) { > + slices = FFMAX(avctx->slices, slices); > + } else { > + av_log(avctx, AV_LOG_ERROR, "The max slices number per frame " > + "cannot be more than %d.\n", FFMIN(ctx->max_slices, > priv->mb_height)); > + return AVERROR_INVALIDDATA;
AVERROR(EINVAL) for invalid user parameters. > + } > + } > + pic->nb_slices = slices; > > return 0; > } > @@ -716,8 +727,7 @@ static int > vaapi_encode_h264_init_slice_params(AVCodecContext *avctx, > sh->nal_unit_header.nal_ref_idc = pic->type != PICTURE_TYPE_B; > } > > - // Only one slice per frame. > - sh->first_mb_in_slice = 0; > + sh->first_mb_in_slice = !!slice->index; > sh->slice_type = priv->slice_type; > > sh->pic_parameter_set_id = pps->pic_parameter_set_id; > @@ -738,14 +748,19 @@ static int > vaapi_encode_h264_init_slice_params(AVCodecContext *avctx, > sh->slice_qp_delta = priv->fixed_qp_idr - (pps->pic_init_qp_minus26 > + 26); > > > - vslice->macroblock_address = sh->first_mb_in_slice; > - vslice->num_macroblocks = priv->mb_width * priv->mb_height; > + vslice->macroblock_address = slice->index * priv->mb_width * > (priv->mb_height / pic->nb_slices); > + if (slice->index == pic->nb_slices - 1) { > + vslice->num_macroblocks = priv->mb_width * priv->mb_height > + - slice->index * priv->mb_width * > (priv->mb_height / pic->nb_slices); > + priv->idr_pic_count++; > + } else > + vslice->num_macroblocks = priv->mb_width * (priv->mb_height / > pic->nb_slices); This dumps all of the rounding error in the last slice. E.g. 1080p with 8 slices gives you 68 macroblocks high, so you get seven slices with 68/8 = 8 macroblock height and the last one has 12 macroblock height. It should be balanced so that all slices are roughly the same size (8-slice 1080p -> four slices of 9 + four slices of 8). It might make sense to put the residual rounding error away from the middle of the frame too (so 9, 9, 8, 8, 8, 8, 9, 9), though that's probably second-order. > > vslice->macroblock_info = VA_INVALID_ID; > > vslice->slice_type = sh->slice_type % 5; > vslice->pic_parameter_set_id = sh->pic_parameter_set_id; > - vslice->idr_pic_id = sh->idr_pic_id; > + vslice->idr_pic_id = priv->idr_pic_count; > > vslice->pic_order_cnt_lsb = sh->pic_order_cnt_lsb; > > @@ -855,6 +870,10 @@ static av_cold int > vaapi_encode_h264_configure(AVCodecContext *avctx) > } > } > > + if (!ctx->max_slices && avctx->slices > 0) > + av_log(avctx, AV_LOG_WARNING, "The encode slice option is not " > + "supported with the driver.\n"); Maybe this should fail in the same way as the above case where you ask for too many slices? If the user requests them it is probably for conformance reasons, so continuing and generating a stream without slices seems unlikely to be useful. (The Mesa driver on AMD hits this case.) > + > return 0; > } > > Unfortunately, this doesn't seem to work at all for me - the driver is happy with the input, but I always get corrupt output. I tried on both Haswell and Coffee Lake with the current i965 driver. E.g. (with "Intel i965 driver for Intel(R) Coffee Lake - 2.1.1.pre1 (2.1.0-44-g40b15a5)"): $ ./ffmpeg_g -v 55 -y -threads 1 -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format vaapi -i ~/video/test/bbb_1080_264.mp4 -an -c:v h264_vaapi -slices 2 -frames:v 1 test.264 ... [h264_vaapi @ 0x5607d595e280] Encode frame: 1920x1080 (0). [h264_vaapi @ 0x5607d595e280] Pictures: IDR (0/0) [h264_vaapi @ 0x5607d595e280] Issuing encode for pic 0/0 as type IDR. [h264_vaapi @ 0x5607d595e280] No reference pictures. [h264_vaapi @ 0x5607d595e280] Input surface is 0x4000013. [h264_vaapi @ 0x5607d595e280] Recon surface is 0x400001a. [h264_vaapi @ 0x5607d595e280] Allocated output buffer 0x8000003 [h264_vaapi @ 0x5607d595e280] Output buffer is 0x8000003. [h264_vaapi @ 0x5607d595e280] Param buffer (27) is 0x8000002. [h264_vaapi @ 0x5607d595e280] Param buffer (22) is 0x8000001. [h264_vaapi @ 0x5607d595e280] Param buffer (23) is 0x8000000. [h264_vaapi @ 0x5607d595e280] Packed header buffer (1) is 0x8000004/0x8000005 (328 bits). [h264_vaapi @ 0x5607d595e280] Packed header buffer (4) is 0x8000006/0x8000007 (1040 bits). [h264_vaapi @ 0x5607d595e280] Packed header buffer (3) is 0x8000008/0x8000009 (72 bits). [h264_vaapi @ 0x5607d595e280] Param buffer (24) is 0x800000a. [h264_vaapi @ 0x5607d595e280] Packed header buffer (3) is 0x800000b/0x800000c (72 bits). [h264_vaapi @ 0x5607d595e280] Param buffer (24) is 0x800000d. [h264_vaapi @ 0x5607d595e280] Pictures ending truncated GOP: IDR (0/0) [h264_vaapi @ 0x5607d595e280] Sync to pic 0/0 (input surface 0x4000013). [h264_vaapi @ 0x5607d595e280] Output buffer: 623 bytes (status 00000000). [h264_vaapi @ 0x5607d595e280] Output read for pic 0/0. ... which looks good - two slice param buffers and two slice headers. But: $ ffmpeg -i test.264 out.mjpeg ... [h264 @ 0x559556ae1d40] top block unavailable for requested intra mode -1 [h264 @ 0x559556ae1d40] error while decoding MB 0 1, bytestream 132 [h264 @ 0x559556ae1d40] concealing 8090 DC, 8090 AC, 8090 MV errors in I frame Input #0, h264, from 'test.264': Duration: N/A, bitrate: N/A Stream #0:0: Video: h264 (High), yuv420p(progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 tbr, 1200k tbn, 120 tbc ... and the frame data does not resemble the input at all (but the headers are correct). The reference decoder has some similar complaints and produces nothing useful: $ ldecod.exe ... warning: Intra_8x8_Horizontal_Up prediction mode not allowed at mb 120 warning: Intra_4x4_Vertical_Left prediction mode not allowed at mb 120 ... I do remember this working in one of the earlier versions. What are you testing with here? (I'll build an older driver tomorrow to try, in case that's somehow been broken.) Thanks, - Mark _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel