Rémi Denis-Courmont: > > > Le 11 juin 2024 13:37:57 GMT+03:00, Andreas Rheinhardt > <andreas.rheinha...@outlook.com> a écrit : >> Rémi Denis-Courmont: >>> --- >>> libavcodec/mpegvideo.c | 40 +++++++++------------------------------- >>> 1 file changed, 9 insertions(+), 31 deletions(-) >>> >>> diff --git a/libavcodec/mpegvideo.c b/libavcodec/mpegvideo.c >>> index 7af823b8bd..810888fc47 100644 >>> --- a/libavcodec/mpegvideo.c >>> +++ b/libavcodec/mpegvideo.c >>> @@ -201,13 +201,11 @@ static void >>> dct_unquantize_mpeg2_inter_c(MpegEncContext *s, >>> static void dct_unquantize_h263_intra_c(MpegEncContext *s, >>> int16_t *block, int n, int qscale) >>> { >>> - int i, level, qmul, qadd; >>> - int nCoeffs; >>> + int qmul = qscale << 1; >>> + int qadd, nCoeffs; >>> >>> av_assert2(s->block_last_index[n]>=0 || s->h263_aic); >>> >>> - qmul = qscale << 1; >>> - >>> if (!s->h263_aic) { >>> block[0] *= n < 4 ? s->y_dc_scale : s->c_dc_scale; >>> qadd = (qscale - 1) | 1; >>> @@ -219,43 +217,20 @@ static void >>> dct_unquantize_h263_intra_c(MpegEncContext *s, >>> else >>> nCoeffs= s->intra_scantable.raster_end[ s->block_last_index[n] ]; >>> >>> - for(i=1; i<=nCoeffs; i++) { >>> - level = block[i]; >>> - if (level) { >>> - if (level < 0) { >>> - level = level * qmul - qadd; >>> - } else { >>> - level = level * qmul + qadd; >>> - } >>> - block[i] = level; >>> - } >>> - } >>> + s->h263dsp.h263_dct_unquantize_intra(block, nCoeffs, qmul, qadd); >>> } >>> >>> static void dct_unquantize_h263_inter_c(MpegEncContext *s, >>> int16_t *block, int n, int qscale) >>> { >>> - int i, level, qmul, qadd; >>> + int qmul = qscale << 1; >>> + int qadd = (qscale - 1) | 1; >>> int nCoeffs; >>> >>> av_assert2(s->block_last_index[n]>=0); >>> >>> - qadd = (qscale - 1) | 1; >>> - qmul = qscale << 1; >>> - >>> nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ]; >>> - >>> - for(i=0; i<=nCoeffs; i++) { >>> - level = block[i]; >>> - if (level) { >>> - if (level < 0) { >>> - level = level * qmul - qadd; >>> - } else { >>> - level = level * qmul + qadd; >>> - } >>> - block[i] = level; >>> - } >>> - } >>> + s->h263dsp.h263_dct_unquantize_inter(block, nCoeffs, qmul, qadd); >> >> What is the cost of these function calls? > > The same as any other indirect function tail call. What do you want me to > say?! Changing how FFmpeg handles assembler optimisations, or removing the > upper level of indirection is outside the scope of this MR (and not very > realistic). > > For that matter, there are hundreds of indirect calls that could be optimised > out on some platforms, notably almost all Armv8 NEON functions and, on > x86-64, all MMX or SSE2 exclusives. I don't get nitpicking this one.
1. Given that we always build the C versions of dsp functions, the latter claim is wrong. (It is only the functions without assembly versions (for a given platform) for which indirect calls could be avoided.) 2. I am not specifically asking about the cost of the indirect call, but of the call. The callee will only be called from one place in this scenario (presuming (for the C versions) that the compiler inlines h263_dct_unquantize_inter_c into h263_dct_unquantize_intra_c), so if it were not for arch-specific optimized versions we would inline it in its caller (just as it is now). The loop does not look like it benefits a lot from vectorizing, so I am really wondering whether this optimization is even a win when benchmarking the real thing (not only the loop). - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".