On 29/03/2022 14:13, Martin Storsjö wrote:
On Fri, 25 Mar 2022, Ben Avison wrote:
Disable ff_add_pixels_clamped_arm, which was found to fail the test.
I had a look at this function, and I see that the overflow checks are using
tst r6, #0x100
to see whether the addition overflowed (either above or below). However,
if block[] was e.g. 0x200, it's possible to overflow without setting
this bit at all.
Yes, thinking about it, that test is only valid if the signed 16-bit
value from block[] lies in the range -0x100..+0x100 inclusive, otherwise
there exists at least one unsigned 8-bit value which should have clamped
but won't.
Secondly, the clamping seems to be done with
movne r6, r5, lsr #24
However that should use asr, not lsr, I think, to get proper clamping in
both ends?
r5 is the NOTted version, so all that's doing is selecting 0x000000FF if
there was positive overflow, and 0x00000000 if there was negative
overflow. Given that bit 8 and above need to be zero to facilitate
repacking the 8-bit samples, that's the right thing to do.
Thirdly - the added test also occasionally fails for the other existing
functions (armv6, neon) and the newly added aarch64 neon version. If you
have e.g. src[] = 32767, dst[] = 255, then the widening 8->16 addition
will overflow, as there's no operation that both widens and clamps at
the same time.
So it does. I obviously just didn't hit those cases in my test runs!
I can't easily test all codecs that use this function, but I just tried
instrumenting the VC-1 case and it doesn't appear to actually use this
particular function, so I'm none the wiser!
Should I just limit the 16-bit values to +/-0x100 and re-enable the
armv4 fast path then?
Ben
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".