vaapi_encode: Add VAAPI AV1 encoder

Mark Thompson Sun, 13 Aug 2023 14:44:02 -0700

On 10/08/2023 03:54, Wang, Fei W wrote:

On Mon, 2023-08-07 at 22:21 +0100, Mark Thompson wrote:

On 03/08/2023 07:01, fei.w.wang-at-intel....@ffmpeg.org wrote:

From: Fei Wang <fei.w.w...@intel.com>


Signed-off-by: Fei Wang <fei.w.w...@intel.com>
---
   Changelog                     |    1 +
   configure                     |    3 +
   doc/encoders.texi             |   13 +
   libavcodec/Makefile           |    1 +
   libavcodec/allcodecs.c        |    1 +
   libavcodec/vaapi_encode.c     |  125 +++-
   libavcodec/vaapi_encode.h     |   12 +
   libavcodec/vaapi_encode_av1.c | 1229
+++++++++++++++++++++++++++++++++
   libavcodec/version.h          |    2 +-
   9 files changed, 1368 insertions(+), 19 deletions(-)
   create mode 100644 libavcodec/vaapi_encode_av1.c


I assume this is tested on Intel hardware.  Is it tested on AMD as
well?  (Apparently working in recent Mesa.)


AMD tested the patchset months ago:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22585

This patch changed a little compare with the version in Cartwheel,
@Ruijing, Could you help to review this version in ML? To avoid the
diffs break you. Thanks.

...
@@ -669,6 +669,15 @@ static int
vaapi_encode_set_output_timestamp(AVCodecContext *avctx,
   {
       VAAPIEncodeContext *ctx = avctx->priv_data;

+ // AV1 packs P frame and next B frame into one pkt, and uses

the other
+    // repeat frame header pkt at the display order position of
the P frame
+    // to indicate its frame index. Each frame has a corresponding
pkt in its
+    // display order position. So don't need to consider delay for
AV1 timestamp.
+    if (avctx->codec_id == AV_CODEC_ID_AV1) {
+        pkt->dts = pkt->pts - ctx->dts_pts_diff;
+        return 0;
+    }


This doesn't get you the right result, though?  The PTS and DTS on
every AV1 packet want to be the same.


The result tested good which can be played normally. Just aligned with
other vaapi encoders that the 1st frame start with 0/-1 of PTS/DTS if
have B frame. Set PTS/DTS to same also looks good.


DTS != PTS should only be true when the stream has externally-visible frame 
reordering in, which AV1 doesn't.

The other two AV1 encoders both output DTS == PTS; I think you should match 
that.

In any case, please don't put tests for codec ID in the common
code.  I suggest that the timestamp behaviour wants to be a new
FLAG_*.

+
...
@@ -1128,9 +1182,19 @@ static int
vaapi_encode_pick_next(AVCodecContext *avctx,

vaapi_encode_add_ref(avctx, pic, pic, 0, 1, 0);

       if (pic->type != PICTURE_TYPE_IDR) {
-        vaapi_encode_add_ref(avctx, pic, start,
-                             pic->type == PICTURE_TYPE_P,
-                             b_counter > 0, 0);
+        // TODO: apply both previous and forward multi reference
for all vaapi encoders.
+        // And L0/L1 reference frame number can be set dynamically
through query
+        // VAConfigAttribEncMaxRefFrames attribute.
+        if (avctx->codec_id == AV_CODEC_ID_AV1) {
+            for (i = 0; i < ctx->nb_next_prev; i++)
+                vaapi_encode_add_ref(avctx, pic, ctx-

next_prev[i],

+                                     pic->type == PICTURE_TYPE_P,
+                                     b_counter > 0, 0);


So an undisclosed aim of the previous patch was to make this extra
list of the most recent N frames for this to use with P frames only?


Currently, yes. As the TODO comment says, I will apply it to all other
codecs and B frame next, it will need lots of test on different
hardware. Add this for AV1 here is to align with VPL which use 2
previous reference for P frame. Base on our test, 2 reference frame for
P will improve ~5% BD-rate for quality, so that can make it possible to
let user get same quality with VPL after adjusting and aligning encoder
options.


Hmm.  My Intel test machine says it can take 8 L0 and 2 L1 references in H.264. 
 I might try this, seems like it should be easy to test.


Why this partcular structure for P frames only, though?  Seems like
if you support extra references then other codecs could also make use
of them in either direction, so we would be better off hinting that
the codec would like up to N references and then using the contents
of the existing DPB, plus keep extras if there is space.

...
+
+static av_cold int vaapi_encode_av1_configure(AVCodecContext
*avctx)
+{
+    VAAPIEncodeContext     *ctx = avctx->priv_data;
+    VAAPIEncodeAV1Context *priv = avctx->priv_data;
+    int ret;
+
+    ret = ff_cbs_init(&priv->cbc, AV_CODEC_ID_AV1, avctx);
+    if (ret < 0)
+        return ret;
+
+    if (ctx->rc_mode->quality) {
+        priv->q_idx_p = av_clip(ctx->rc_quality, 0,
AV1_MAX_QUANT);
+        if (fabs(avctx->i_quant_factor) > 0.0)
+            priv->q_idx_idr =
+                av_clip((fabs(avctx->i_quant_factor) * priv-

q_idx_p  +

+                         avctx->i_quant_offset) + 0.5,
+                        0, AV1_MAX_QUANT);
+        else
+            priv->q_idx_idr = priv->q_idx_p;
+
+        if (fabs(avctx->b_quant_factor) > 0.0)
+            priv->q_idx_b =
+                av_clip((fabs(avctx->b_quant_factor) * priv-

q_idx_p  +

+                         avctx->b_quant_offset) + 0.5,
+                        0, AV1_MAX_QUANT);
+        else
+            priv->q_idx_b = priv->q_idx_p;
+    } else {
+        /** Arbitrary value */
+        priv->q_idx_idr = priv->q_idx_p = priv->q_idx_b = 128;
+    }


Are there any alignment requirements here?  (If I gave it an input
which is, say, 1361x1, would that work?)


I didn't get you. Which parameter here need to be align? Max/Min
resolution HW supported will be checked somewhere else. If use 1361x1
there will be the error like:

"Hardware does not support encoding at size 1376x16 (constraints: width
32-8192 height 32-8192)."


I mean of padding for the reference surfaces.  I was expecting this to be 
rounded up to a multiple of 8 (to match MiCols/MiRows) - see this function in 
other codecs.

...
+
+    /** update obu size in bitstream */
+    if (fh_obu->header.obu_has_size_field) {
+        obu_size_len = priv->attr_ext2.bits.obu_size_bytes_minus1
+ 1;
+        for (i = 0; i < obu_size_len; i++) {
+            byte = obu_size >> (7 * i) & 0x7f;
+            if (i < obu_size_len - 1)
+                byte |= 0x80;
+            put_bits(&pbc_tmp, 8, byte);
+        }
+        flush_put_bits(&pbc_tmp);
+        memmove(pbc_tmp.buf_ptr, pbc_tmp.buf_ptr + (8 -
obu_size_len), obu_size);
+        *data_len -= (8 - obu_size_len) * 8;
+    }


Why is there an incomplete duplicate of the cbs_av1 header writing
code here?


To record some position/size in bitstream that needed for VAAPI. Like
qp_index/loopfilter/cdef offset and cdef parameters size in bit. It's
not reasonable to add the specific parameters into CBS.


How about with 
<https://lists.ffmpeg.org/pipermail/ffmpeg-devel/2023-August/313228.html>?

...
+
+static int vaapi_encode_av1_set_tile(AVCodecContext *avctx)
+{
+    VAAPIEncodeAV1Context *priv = avctx->priv_data;
+    int mi_cols, mi_rows, sb_shift, sb_size;
+    int max_tile_area_sb, max_tile_area_sb_varied;
+    int tile_width_sb, tile_height_sb, widest_tile_sb;
+    int min_log2_tiles;
+    int tile_rows_tmp, i;
+
+    if (priv->tile_cols > AV1_MAX_TILE_COLS ||
+        priv->tile_rows > AV1_MAX_TILE_ROWS) {
+        av_log(avctx, AV_LOG_ERROR, "Invalid tile number %dx%d,
should less than %dx%d.\n",
+               priv->tile_cols, priv->tile_rows,
AV1_MAX_TILE_COLS, AV1_MAX_TILE_ROWS);


Can't this be a constraint on the option rather than a special check
here?


Tiles set with IMAGE_SIZE option type which aligns with HEVC. It
doesn't support Min/MAX check.


Hmm, yeah - fair enough.

...
+
+    sh->color_config = (AV1RawColorConfig) {
+        .high_bitdepth                  = desc->comp[0].depth == 8
? 0 : 1,
+        .color_primaries                = avctx->color_primaries,
+        .transfer_characteristics       = avctx->color_trc,
+        .matrix_coefficients            = avctx->colorspace,
+        .color_description_present_flag = (avctx->color_primaries
!= AVCOL_PRI_UNSPECIFIED ||
+                                           avctx-

color_trc       != AVCOL_TRC_UNSPECIFIED ||

+                                           avctx-

colorspace      != AVCOL_SPC_UNSPECIFIED),

+        .color_range                    = avctx->color_range ==
AVCOL_RANGE_JPEG,
+        .subsampling_x                  = desc->log2_chroma_w,
+        .subsampling_y                  = desc->log2_chroma_h,


.chroma_sample_position = some function of chroma_sample_location.


There is no chroma_sample_position supported in VAAPI yet.


VAAPI won't care.  It should be marked correctly in the headers so that the 
output stream has the right value.

(CFL not having any ability to deal with the sample location is a known 
deficiency of AV1.)

...
+    fh->tile_cols                 = priv->tile_cols;
+    fh->tile_rows                 = priv->tile_rows;
+    fh->tile_cols_log2            = priv->tile_cols_log2;
+    fh->tile_rows_log2            = priv->tile_rows_log2;
+    fh->uniform_tile_spacing_flag = priv->uniform_tile;
+    fh->tile_size_bytes_minus1    = priv-

attr_ext2.bits.tile_size_bytes_minus1;

+    fh->reduced_tx_set            = 1;


This not being present in the capabilities feels like an omission in
the API.  That's a large loss for an implementation which does
support the full transform set.


Personally understanding, "not being present" means it must be
supported. If any exception driver, then they can ask to add new
capability to libva.


reduced_tx_set is a negative property - you want to set it to zero to allow all 
transforms, unless the implementation can't cope with that.

Does the Intel implementation require it to be set?  (That is, does not support 
the full transform set.)

...
+
+    for (i = 0; i < fh->tile_cols; i++)
+        fh->width_in_sbs_minus_1[i] = vpic-

width_in_sbs_minus_1[i] = priv->width_in_sbs_minus_1[i];

+
+    for (i = 0; i < fh->tile_rows; i++)
+        fh->height_in_sbs_minus_1[i] = vpic-

height_in_sbs_minus_1[i] = priv->height_in_sbs_minus_1[i];


Er, what?  Doesn't this fail the inference on the last tile
width/height if you don't split exactly?


What's the meaning of "split exactly"? All the tile w/h will be split
in vaapi_encode_av1_set_tile include last tile. Just make sure the info
of tiles w/h in frame header are same with the info passed to VAAPI
should be fine.


Sorry, I misread this.  The values are set elsewhere, it's not setting all 
tiles to have the same SB width/height.

...
+
+    vpic->base_qindex          = fh->base_q_idx;
+    vpic->frame_width_minus_1  = fh->frame_width_minus_1;
+    vpic->frame_height_minus_1 = fh->frame_height_minus_1;
+    vpic->primary_ref_frame    = fh->primary_ref_frame;
+    vpic->reconstructed_frame  = pic->recon_surface;
+    vpic->coded_buf            = pic->output_buffer;
+    vpic->tile_cols            = fh->tile_cols;
+    vpic->tile_rows            = fh->tile_rows;
+    vpic->order_hint           = fh->order_hint;
+#if VA_CHECK_VERSION(1, 15, 0)
+    vpic->refresh_frame_flags  = fh->refresh_frame_flags;
+#endif


What will the difference be between running in the two versions?  (Is
it worth supporting the older one?  Cf. bit_depth on VP9.)


I'd prefer to support older one. In case of someone using old version
and works well with ffmpeg-vpl(as vpl already support av1 encode long
time ago), otherwise he must be confuse why ffmpeg-vaapi need higher
VAAPI version but VPL doesn't.


Ok.  So the driver doesn't use the value of refresh_frame_flags if it doesn't 
affect the result?  (Seems like it shouldn't, not sure why it would be added to 
the API.)

...


Thanks,

- Mark
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v3 6/6] lavc/vaapi_encode: Add VAAPI AV1 encoder

Reply via email to