Rémi Denis-Courmont: > Le lauantaina 6. heinäkuuta 2024, 19.20.33 EEST Andreas Rheinhardt a écrit : >> Rémi Denis-Courmont: >>> Le lauantaina 6. heinäkuuta 2024, 18.23.00 EEST Andreas Rheinhardt a écrit > : >>>>> static void dct_unquantize_h263_inter_c(MpegEncContext *s, >>>>> >>>>> int16_t *block, int n, int qscale) >>>>> >>>>> { >>>>> >>>>> - int i, level, qmul, qadd; >>>>> + int qmul = qscale << 1; >>>>> + int qadd = (qscale - 1) | 1; >>>>> >>>>> int nCoeffs; >>>>> >>>>> av_assert2(s->block_last_index[n]>=0); >>>>> >>>>> - qadd = (qscale - 1) | 1; >>>>> - qmul = qscale << 1; >>>>> - >>>>> >>>>> nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ]; >>>>> >>>>> - >>>>> - for(i=0; i<=nCoeffs; i++) { >>>>> - level = block[i]; >>>>> - if (level) { >>>>> - if (level < 0) { >>>>> - level = level * qmul - qadd; >>>>> - } else { >>>>> - level = level * qmul + qadd; >>>>> - } >>>>> - block[i] = level; >>>>> - } >>>>> - } >>>>> + s->h263dsp.h263_dct_unquantize_inter(block, nCoeffs, qmul, qadd); >>>> >>>> This adds an indirection. I have asked you to actually benchmark this >>>> code (and not only the DSP function you add), but you never did. >>> >>> I already pointed out previously that this is the way this project does >>> DSP >>> code. Certainly it would be nice to hard-code the path when there is only >>> one possible. This is often the case on Armv8 notably, and of course on >>> platforms without optimisations. >>> >>> But that's a general problem way beyond the scope of this patchset. We >>> always add indirect function calls in this sort of situation, and I don't >>> see why I would have duty to benchmark it, so I am going to ignore this. >> >> You have a duty to benchmark it because you add it where it wasn't before. > > I don't recall other people benchmarking the indirect branch they've added > previously for other DSP code. Recent examples include VVC and FLAC. > Rightfully so, because there is not really an alternative anyway. Even GNU > IFUNCs and Glibc alternative libraries internally use an indirect branch > (hidden in PLT/GOT), and FFmpeg can't self-patch at load-time like the Linux > kernel does, nor can it generate dynamic PLT entries with direct branches. > > Also if an indirect call is unacceptable, then how come the calling code is > itself an indirect call and for abstraction rather than performance.
I did not even say that it is unacceptable. Merely that it should be benched. > > Your request is completely arbitrary here. Yes, there is already an indirect > call close up, and so? I'm not trying to clean MpegEncContext here, only > trying to add one function to checkasm, RVV and (with James' work) post-MMX > x86. > > Lastly, you don't even specify what benchmark to run. Comparing something > against nothing is, as my manager would say, pointless, since the relative > overhead ought to be an approximation of infinity (in practice, you end up > measuring the overhead of the benchmarking code instead). You shall compare the function you are modifying, namely dct_unquantize_h263_(intra|inter)_c. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".