On 29/03/2022 14:13, Martin Storsjö wrote:
On Fri, 25 Mar 2022, Ben Avison wrote:

Disable ff_add_pixels_clamped_arm, which was found to fail the test.

I had a look at this function, and I see that the overflow checks are using

         tst             r6,  #0x100

to see whether the addition overflowed (either above or below). However, if block[] was e.g. 0x200, it's possible to overflow without setting this bit at all.

Yes, thinking about it, that test is only valid if the signed 16-bit value from block[] lies in the range -0x100..+0x100 inclusive, otherwise there exists at least one unsigned 8-bit value which should have clamped but won't.

Secondly, the clamping seems to be done with

         movne           r6,  r5,  lsr #24

However that should use asr, not lsr, I think, to get proper clamping in both ends?

r5 is the NOTted version, so all that's doing is selecting 0x000000FF if there was positive overflow, and 0x00000000 if there was negative overflow. Given that bit 8 and above need to be zero to facilitate repacking the 8-bit samples, that's the right thing to do.

Thirdly - the added test also occasionally fails for the other existing functions (armv6, neon) and the newly added aarch64 neon version. If you have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition will overflow, as there's no operation that both widens and clamps at the same time.

So it does. I obviously just didn't hit those cases in my test runs!

I can't easily test all codecs that use this function, but I just tried instrumenting the VC-1 case and it doesn't appear to actually use this particular function, so I'm none the wiser!

Should I just limit the 16-bit values to +/-0x100 and re-enable the armv4 fast path then?

Ben
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to