thank you, i've tried your suggestions and here is what I got (the
left column is A53 and the right is A72)
current code:
pred16x16_top_dc_10_c: 106.093.2
pred16x16_top_dc_10_neon:87.777.5
ld1, add, addv variant:
pred16x16_top_dc_10_c: 106.095.5
pred1
Inlined a few comments for ff_pred16x16_top_dc_neon_10, other are similar.
At 2021-04-14 20:35:44, "Martin Storsjö" wrote:
>On Tue, 13 Apr 2021, Mikhail Nitenko wrote:
>
>> Benchmarks:
>> pred16x16_dc_10_c: 124.0
>> pred16x16_dc_10_neon: 97.2
>> pred16x16_horizontal_10_c: 71.7
>> pred16x16_horizo
On Tue, 13 Apr 2021, Mikhail Nitenko wrote:
Benchmarks:
pred16x16_dc_10_c: 124.0
pred16x16_dc_10_neon: 97.2
pred16x16_horizontal_10_c: 71.7
pred16x16_horizontal_10_neon: 66.2
pred16x16_top_dc_10_c: 90.7
pred16x16_top_dc_10_neon: 71.5
pred16x16_vertical_10_c: 64.7
pred16x16_vertical_10_neon: 61.7
Benchmarks:
pred16x16_dc_10_c: 124.0
pred16x16_dc_10_neon: 97.2
pred16x16_horizontal_10_c: 71.7
pred16x16_horizontal_10_neon: 66.2
pred16x16_top_dc_10_c: 90.7
pred16x16_top_dc_10_neon: 71.5
pred16x16_vertical_10_c: 64.7
pred16x16_vertical_10_neon: 61.7
Some functions work slower than C and are lef