Am So., 10. Jan. 2021 um 19:55 Uhr schrieb Lynne <d...@lynne.ee>: > > Jan 10, 2021, 17:43 by reimar.doeffin...@gmx.de: > > > From: Reimar Döffinger <reimar.doeffin...@gmx.de> > > > > This requests loops to be vectorized using SIMD > > instructions. > > The performance increase is far from hand-optimized > > assembly but still significant over the plain C version. > > Typical values are a 2-4x speedup where a hand-written > > version would achieve 4x-10x. > > So it is far from a replacement, however some architures > > will get hand-written assembler quite late or not at all, > > and this is a good improvement for a trivial amount of work. > > The cause, besides the compiler being a compiler, is > > usually that it does not manage to use saturating instructions > > and thus has to use 32-bit operations where actually > > saturating 16-bit operations would be sufficient. > > Other causes are for example the av_clip functions that > > are not ideal for vectorization (and even as scalar code > > not optimal for any modern CPU that has either CSEL or > > MAX/MIN instructions). > > And of course this only works for relatively simple > > loops, the IDCT functions for example seemed not possible > > to optimize that way. > > Also note that while clang may accept the code and sometimes > > produces warnings, it does not seem to do anything actually > > useful at all. > > Here are example measurements using gcc 10 under Linux (in a VM > > unfortunately) > > on AArch64 on Apple M1: > > Commad: > > time ./ffplay_g LG\ 4K\ HDR\ Demo\ -\ New\ York.ts -t 10 -autoexit -threads > > 1 -noframedrop > > > > Original code: > > real 0m19.572s > > user 0m23.386s > > sys 0m0.213s > > > > Changing all put_hevc: > > real 0m15.648s > > user 0m19.503s (83.4% of original) > > sys 0m0.186s > > > > In addition changing add_residual: > > real 0m15.424s > > user 0m19.278s (82.4% of original) > > sys 0m0.133s > > > > In addition changing planar copy dither: > > real 0m15.040s > > user 0m18.874s (80.7% of original) > > sys 0m0.168s > > > > I think I have to disagree.
> The performance gains are marginal This sounds wrong. Carl Eugen _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".