On Sat, Dec 7, 2024 at 5:54 PM Frank Plowman <p...@frankplowman.com> wrote:
> On 07/12/2024 02:22, Nuo Mi wrote: > > On Wed, Dec 4, 2024 at 2:09 AM Frank Plowman <p...@frankplowman.com> > wrote: > > > >> Hi, > >> > >> Thanks for your review. > >> > >> On 03/12/2024 10:04, Nuo Mi wrote: > >>> Hi Frank, > >>> Thank you for the patch > >>> > >>> On Sat, Nov 30, 2024 at 8:13 PM Frank Plowman <p...@frankplowman.com> > >> wrote: > >>> > >>>> On 30/11/2024 06:46, Nuo Mi wrote: > >>>>> On Fri, Nov 29, 2024 at 6:19 AM Frank Plowman <p...@frankplowman.com > > > >>>> wrote: > >>>>> > >>>>>> Previously, the code only stored the MIP mode and transpose flag in > >> the > >>>>>> relevant tables at the top-left corner of the CU. This information > >> ends > >>>>>> up being retrieved in ff_vvc_intra_pred_* not based on the CU > position > >>>>>> but instead the transform unit position (specifically, using the x0 > >> and > >>>>>> y0 from get_luma_predict_unit). There might be multiple transform > >> units > >>>>>> in a CU, hence the top-left corner of the transform unit might not > >>>>>> coincide with the top-left corner of the CU. Consequently, we need > to > >>>>>> store the MIP information at all positions in the CU, not only its > >>>>>> top-left corner, as we already do for the MIP flag. > >>>>>> > >>>>>> Signed-off-by: Frank Plowman <p...@frankplowman.com> > >>>>>> --- > >>>>>> libavcodec/vvc/ctu.c | 4 ++-- > >>>>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/libavcodec/vvc/ctu.c b/libavcodec/vvc/ctu.c > >>>>>> index 1e06119cfd..0030938cf5 100644 > >>>>>> --- a/libavcodec/vvc/ctu.c > >>>>>> +++ b/libavcodec/vvc/ctu.c > >>>>>> @@ -975,8 +975,8 @@ static void > intra_luma_pred_modes(VVCLocalContext > >>>> *lc) > >>>>>> for (int y = 0; y < (cb_height>>log2_min_cb_size); > y++) { > >>>>>> int width = cb_width>>log2_min_cb_size; > >>>>>> memset(&fc->tab.imf[x], cu->intra_mip_flag, > width); > >>>>>> - fc->tab.imtf[x] = intra_mip_transposed_flag; > >>>>>> - fc->tab.imm[x] = intra_mip_mode; > >>>>>> + memset(&fc->tab.imtf[x], intra_mip_transposed_flag, > >>>>>> width); > >>>>>> + memset(&fc->tab.imm[x], intra_mip_mode, width); > >>>>> > >>>>> intra_mip_mode is 4 bits, 2 flags are 2 bits. maybe we can use a > >> uint8_t > >>>>> for 3 fields, > >>>>> We only need 1 memset and save 2/3 memory. > >>>> > >>>> I've implemented this (patch attached, to be applied atop the set), > but > >>>> it's not as straightforward as it may seem. In particular, because > the > >>>> tables are read directly from when determining which CABAC context to > >>>> use for these flags, we have to add quite a lot of extra code in > cabac.c > >>>> to support this special case where the MIP information is a bit field. > >>>> In my implementation, this was done by adding this coerce_to_bool > >>>> parameter to get_inc and get_top. This does actually save a moderate > >>>> amount of memory though, ~1MB for 4K and ~256kB for 1080p. > >>>> > >>> A coding block can be as small as 4x4, so for a 1080p frame, the memory > >>> required is approximately 2*1920*1080/(4*4) ~= 256 kB. > >>> However, with a maximum delay of 16 frames, the total memory usage will > >> be > >>> 256kB * 16=4 MB. > >>> > >>> In your patch, coerce_to_bool is set to 1 only when we are in > >>> ff_vvc_intra_mip_flag. > >>> How about defining get_mip_inc like this? This way, we can avoid > changing > >>> get_inc > >>> > >>> static av_always_inline > >>> uint8_t get_mip_inc (VVCLocalContext *lc, const uint8_t *ctx) > >>> { > >>> uint8_t left = 0, top = 0; > >>> get_left_top(lc, &left, &top, lc->cu->x0, lc->cu->y0, ctx, ctx); > >>> return !!left + !!top; > >>> } > >> > >> What about instead taking a function pointer as an argument to > >> get_inc/get_left_top which can be used to modify the value read from the > >> table before using it, or NULL to apply no such modification? > > > > Function pointers have a cost > > > > Provided the caller and callee are defined in the same translation unit > and the caller is inlined, as would have been the case here, the > indirect call can be optimised away and the call inlined. GCC calls > this optimisation indirect-inlining. > 👍 > > In any case, I ended up not taking this approach for slightly different > reasons and implemented your first suggestion instead. It should be on > the ML as v2 now. > Cheers, > Frank > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > We are using too much memory. Please do some profiling using Massif or another tool if you're interested. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".