Rémi Denis-Courmont:
> Le lauantaina 6. heinäkuuta 2024, 19.20.33 EEST Andreas Rheinhardt a écrit :
>> Rémi Denis-Courmont:
>>> Le lauantaina 6. heinäkuuta 2024, 18.23.00 EEST Andreas Rheinhardt a écrit 
> :
>>>>>  static void dct_unquantize_h263_inter_c(MpegEncContext *s,
>>>>>  
>>>>>                                    int16_t *block, int n, int qscale)
>>>>>  
>>>>>  {
>>>>>
>>>>> -    int i, level, qmul, qadd;
>>>>> +    int qmul = qscale << 1;
>>>>> +    int qadd = (qscale - 1) | 1;
>>>>>
>>>>>      int nCoeffs;
>>>>>      
>>>>>      av_assert2(s->block_last_index[n]>=0);
>>>>>
>>>>> -    qadd = (qscale - 1) | 1;
>>>>> -    qmul = qscale << 1;
>>>>> -
>>>>>
>>>>>      nCoeffs= s->inter_scantable.raster_end[ s->block_last_index[n] ];
>>>>>
>>>>> -
>>>>> -    for(i=0; i<=nCoeffs; i++) {
>>>>> -        level = block[i];
>>>>> -        if (level) {
>>>>> -            if (level < 0) {
>>>>> -                level = level * qmul - qadd;
>>>>> -            } else {
>>>>> -                level = level * qmul + qadd;
>>>>> -            }
>>>>> -            block[i] = level;
>>>>> -        }
>>>>> -    }
>>>>> +    s->h263dsp.h263_dct_unquantize_inter(block, nCoeffs, qmul, qadd);
>>>>
>>>> This adds an indirection. I have asked you to actually benchmark this
>>>> code (and not only the DSP function you add), but you never did.
>>>
>>> I already pointed out previously that this is the way this project does
>>> DSP
>>> code. Certainly it would be nice to hard-code the path when there is only
>>> one possible. This is often the case on Armv8 notably, and of course on
>>> platforms without optimisations.
>>>
>>> But that's a general problem way beyond the scope of this patchset. We
>>> always add indirect function calls in this sort of situation, and I don't
>>> see why I would have duty to benchmark it, so I am going to ignore this.
>>
>> You have a duty to benchmark it because you add it where it wasn't before.
> 
> I don't recall other people benchmarking the indirect branch they've added 
> previously for other DSP code. Recent examples include VVC and FLAC. 
> Rightfully so, because there is not really an alternative anyway. Even GNU 
> IFUNCs and Glibc alternative libraries internally use an indirect branch 
> (hidden in PLT/GOT), and FFmpeg can't self-patch at load-time like the Linux 
> kernel does, nor can it generate dynamic PLT entries with direct branches.
> 
> Also if an indirect call is unacceptable, then how come the calling code is 
> itself an indirect call and for abstraction rather than performance.

I did not even say that it is unacceptable. Merely that it should be
benched.

> 
> Your request is completely arbitrary here. Yes, there is already an indirect 
> call close up, and so? I'm not trying to clean MpegEncContext here, only 
> trying to add one function to checkasm, RVV and (with James' work) post-MMX 
> x86.
> 
> Lastly, you don't even specify what benchmark to run. Comparing something 
> against nothing is, as my manager would say, pointless, since the relative 
> overhead ought to be an approximation of infinity (in practice, you end up 
> measuring the overhead of the benchmarking code instead).

You shall compare the function you are modifying, namely
dct_unquantize_h263_(intra|inter)_c.

- Andreas

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to