On Sat, Mar 19, 2016 at 9:09 AM, Reimar Döffinger <reimar.doeffin...@gmx.de> wrote: > On Sat, Mar 19, 2016 at 12:42:09PM +0100, Clément Bœsch wrote: >> On Fri, Mar 18, 2016 at 10:12:14PM -0700, Ganesh Ajjanagadde wrote: >> > -static inline void abs_pow34_v(float *av_restrict out, const float >> > *av_restrict in, const int size) >> > -{ >> > - int i; >> > - for (i = 0; i < size; i++) { >> > - float a = fabsf(in[i]); >> > - out[i] = sqrtf(a * sqrtf(a)); >> > - } >> > -} >> > - >> > static inline float pos_pow34(float a) >> > { >> > return sqrtf(a * sqrtf(a)); >> > } >> > >> > +static inline void abs_pow34_v(float *av_restrict out, const float >> > *av_restrict in, const int size) >> > +{ >> > + av_assert2(!(size % 4)); >> > + for (int i = 0; i < size; i+=4) { >> > + float a0 = fabsf(in[i]); >> > + float a1 = fabsf(in[i+1]); >> > + float a2 = fabsf(in[i+2]); >> > + float a3 = fabsf(in[i+3]); >> > + out[i ] = pos_pow34(a0); >> > + out[i+1] = pos_pow34(a1); >> > + out[i+2] = pos_pow34(a2); >> > + out[i+3] = pos_pow34(a3); >> > + } >> > +} >> > + >> >> I'm curious (and lazy), is GCC able to unroll by itself if you hint it >> with a loop such as: >> >> int i; >> for (i = 0; i < size & ~3; i++) { >> float a = fabsf(in[i]); >> out[i] = sqrtf(a * sqrtf(a)); >> }
Does not help, yields ~ 140 decicycles like earlier. > > I haven't been able to to figure out for > sure for this one, but at least the other one > Debian gcc 5.3.1 already unrolls and vectorizes > for me, though it has a bit of extra code to > handle cases where size is not a multiple of 4. I suspect the speed change in that case is coming from the removal of such extra code, as I am running gcc 5.3.0 on Arch. > So I suspect "which gcc?" is probably an important > question. > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel