Le lauantaina 6. heinäkuuta 2024, 19.20.33 EEST Andreas Rheinhardt a écrit : > Rémi Denis-Courmont: > > Le lauantaina 6. heinäkuuta 2024, 18.23.00 EEST Andreas Rheinhardt a écrit : > >>> static void dct_unquantize_h263_inter_c(MpegEncContext *s, > >>> > >>> int16_t *block, int n, int qscale) > >>> > >>> { > >>> > >>> - int i, level, qmul, qadd; > >>> + int qmul = qscale << 1; > >>> + int qadd = (qscale - 1) | 1; > >>> > >>> int nCoeffs; > >>> > >>> av_assert2(s->block_last_index[n]>=0); > >>> > >>> - qadd = (qscale - 1) | 1; > >>> - qmul = qscale << 1; > >>> - > >>> > >>> nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ]; > >>> > >>> - > >>> - for(i=0; i<=nCoeffs; i++) { > >>> - level = block[i]; > >>> - if (level) { > >>> - if (level < 0) { > >>> - level = level * qmul - qadd; > >>> - } else { > >>> - level = level * qmul + qadd; > >>> - } > >>> - block[i] = level; > >>> - } > >>> - } > >>> + s->h263dsp.h263_dct_unquantize_inter(block, nCoeffs, qmul, qadd); > >> > >> This adds an indirection. I have asked you to actually benchmark this > >> code (and not only the DSP function you add), but you never did. > > > > I already pointed out previously that this is the way this project does > > DSP > > code. Certainly it would be nice to hard-code the path when there is only > > one possible. This is often the case on Armv8 notably, and of course on > > platforms without optimisations. > > > > But that's a general problem way beyond the scope of this patchset. We > > always add indirect function calls in this sort of situation, and I don't > > see why I would have duty to benchmark it, so I am going to ignore this. > > You have a duty to benchmark it because you add it where it wasn't before.
I don't recall other people benchmarking the indirect branch they've added previously for other DSP code. Recent examples include VVC and FLAC. Rightfully so, because there is not really an alternative anyway. Even GNU IFUNCs and Glibc alternative libraries internally use an indirect branch (hidden in PLT/GOT), and FFmpeg can't self-patch at load-time like the Linux kernel does, nor can it generate dynamic PLT entries with direct branches. Also if an indirect call is unacceptable, then how come the calling code is itself an indirect call and for abstraction rather than performance. Your request is completely arbitrary here. Yes, there is already an indirect call close up, and so? I'm not trying to clean MpegEncContext here, only trying to add one function to checkasm, RVV and (with James' work) post-MMX x86. Lastly, you don't even specify what benchmark to run. Comparing something against nothing is, as my manager would say, pointless, since the relative overhead ought to be an approximation of infinity (in practice, you end up measuring the overhead of the benchmarking code instead). -- Rémi Denis-Courmont http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".