On Sat, Mar 19, 2016 at 12:42:09PM +0100, Clément Bœsch wrote: > On Fri, Mar 18, 2016 at 10:12:14PM -0700, Ganesh Ajjanagadde wrote: > > -static inline void abs_pow34_v(float *av_restrict out, const float > > *av_restrict in, const int size) > > -{ > > - int i; > > - for (i = 0; i < size; i++) { > > - float a = fabsf(in[i]); > > - out[i] = sqrtf(a * sqrtf(a)); > > - } > > -} > > - > > static inline float pos_pow34(float a) > > { > > return sqrtf(a * sqrtf(a)); > > } > > > > +static inline void abs_pow34_v(float *av_restrict out, const float > > *av_restrict in, const int size) > > +{ > > + av_assert2(!(size % 4)); > > + for (int i = 0; i < size; i+=4) { > > + float a0 = fabsf(in[i]); > > + float a1 = fabsf(in[i+1]); > > + float a2 = fabsf(in[i+2]); > > + float a3 = fabsf(in[i+3]); > > + out[i ] = pos_pow34(a0); > > + out[i+1] = pos_pow34(a1); > > + out[i+2] = pos_pow34(a2); > > + out[i+3] = pos_pow34(a3); > > + } > > +} > > + > > I'm curious (and lazy), is GCC able to unroll by itself if you hint it > with a loop such as: > > int i; > for (i = 0; i < size & ~3; i++) { > float a = fabsf(in[i]); > out[i] = sqrtf(a * sqrtf(a)); > }
I haven't been able to to figure out for sure for this one, but at least the other one Debian gcc 5.3.1 already unrolls and vectorizes for me, though it has a bit of extra code to handle cases where size is not a multiple of 4. So I suspect "which gcc?" is probably an important question. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel