On Sun, 25 Aug 2019, James Cowgill wrote:
When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.
Fix by reallocating the registers in the two affected functions in
the following way:
ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
ff_sbc_analyze_8_neon: q2-q9 => q8-q15
The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.
Signed-off-by: James Cowgill <jcowg...@debian.org>
---
On 29/07/2019 19:59, Reimar Döffinger wrote:
Seems sensible to me, though extra points if you or someone has numbers on
performance impact.
To know whether it would be worthwhile to check if it can be optimized...
Sorry for the long delay - been on various holidays.
Sorry for the even longer response ;-) I happened to run into this patch
downstream, and noticed that it does look reasonable, but apparently the
second round of the patch was missed back then in 2019.
Our current code is indeed broken and wrong - if we would have had
checkasm tests for it, this issue would have been caught long ago.
I did a few tests on my original patch and overall it was about 2%
slower than before. In any case I think this new patch is a better
solution (although the diff is a lot larger). We don't actually need
that many registers in either of these functions, so instead of
pushing the clobbered callee saved registers, we can reallocate all
the registers to avoid them in the first place. This way there is no
performance impact.
I couldn't find any tests for this encoder, but I have tested a few
audio samples with it and verified the output is identical to what t
was before (and with what I get on x86).
Thanks a lot for doing that! Indeed that's the best we can do since we
don't have tests for it.
I'll go ahead and push this patch soon.
// Martin
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".