On 24 Oct 2022, at 14:01, Martin Storsjö wrote:

> On Tue, 11 Oct 2022, J. Dekker wrote:
>
>> [...]
>> diff --git a/libavcodec/aarch64/hevcdsp_init_aarch64.c 
>> b/libavcodec/aarch64/hevcdsp_init_aarch64.c
>> index 644cc17715..44399b05d8 100644
>> --- a/libavcodec/aarch64/hevcdsp_init_aarch64.c
>> +++ b/libavcodec/aarch64/hevcdsp_init_aarch64.c
>> @@ -69,6 +69,46 @@ void ff_hevc_sao_edge_filter_16x16_8_neon(uint8_t *dst, 
>> const uint8_t *src, ptrd
>>                                           const int16_t *sao_offset_val, int 
>> eo, int width, int height);
>> void ff_hevc_sao_edge_filter_8x8_8_neon(uint8_t *dst, const uint8_t *src, 
>> ptrdiff_t stride_dst,
>>                                         const int16_t *sao_offset_val, int 
>> eo, int width, int height);
>> +void ff_hevc_put_hevc_qpel_h4_8_neon(int16_t *dst, uint8_t *_src, ptrdiff_t 
>> _srcstride, int height,
>> +                                     intptr_t mx, intptr_t my, int width);
>
> The function pointers in the dsp context has gotten 'const' on the source 
> pointers now, which makes it emit a lot of warnings with GCC, and fail with 
> latest Clang. Please rebase and check that it builds without warnings.
>

Fixed.

>> [...]
>> +.ifc \type, qpel
>> +function ff_hevc_put_hevc_h4_8_neon, export=0
>> +        uxtl            v16.8h,  v16.8b
>> +        uxtl            v17.8h,  v17.8b
>> +        uxtl            v18.8h,  v18.8b
>> +        uxtl            v19.8h,  v19.8b
>> +
>> +        mul             v23.4h,  v16.4h, v0.h[0]
>> +        mul             v24.4h,  v18.4h, v0.h[0]
>> +
>> +.irpc i, 1234567
>> +        ext             v20.16b, v16.16b, v17.16b, #(2*\i)
>> +        ext             v21.16b, v18.16b, v19.16b, #(2*\i)
>> +        mla             v23.4h,  v20.4h, v0.h[\i]
>> +        mla             v24.4h,  v21.4h, v0.h[\i]
>> +.endr
>> +        ret
>> +endfunc
>> +.endif
>> +
>> +function ff_hevc_put_hevc_\type\()_h4_8_neon, export=1
>> +        load_filter     mx
>> +.ifc \type, qpel_bi
>> +        mov             x16, #(MAX_PB_SIZE << 2) // src2bstridel
>> +        add             x15, x4, #(MAX_PB_SIZE << 1) // src2b
>
> Beware that you can't in general rely on x16/x17 keeping their values for 
> long. If you branch to a function which is implemented in a different object 
> file, it may end up linked at a place in the address space which is too far 
> away for a regular 'bl' branch, so the linker has to insert a range extension 
> thunk, which clobbers x16/x17. But as long as everything here is branched 
> within the same object file, it should be ok.
>
> In general, if you need to use x16/x17, use it only for very short-lived 
> temporaries.

Alright, thanks for the consideration. Left as is since as you said we're not 
branching anywhere outside this file.

>> +.endif
>> +        sub             src, src, #3
>> +        mov             mx, lr
>
> Please use literal 'x30' instead of 'lr' - older binutils don't support the 
> 'lr' register name alias.

Fixed.

> Other than that, the code seems to run correctly, and the code looks mostly 
> reasonable now. (I didn't do a very deep read-through this time, but it looks 
> like you've addressed my earlier concerns.)

Thanks for the review, pushed with above fixes.

-- 
jd
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to