On 4/18/2016 2:52 PM, Christophe Gisquet wrote: > 2016-04-18 19:15 GMT+02:00 James Almer <jamr...@gmail.com>: >> On 4/18/2016 10:07 AM, Christophe Gisquet wrote: >>> The loops are guaranteed to be at least multiples of 8, so this >>> unrolling is safe but allows exploiting execution ports. >>> >>> For int32 version: 72 -> 57c. >> >> What compiler are you using, and what cpu at configure time? > > gcc 5.1, Win64, haswell. I don't use mingw64 compiler. > >> We're currently enabling tree vectorization for gcc 4.9 or newer on x86, >> and at least with gcc 5.3.0 on mingw-w64 the resulting code now seems worse. >> I didn't bench it, but after this patch it's not being vectorized anymore. > > The code I benchmarked as being 72c is vectorized and keeps being > vectorized here. It actually looks better than the previously > vectorized one. > > The 16_c version is no longer vectorized, but is really a mess here > when vectorized.
The 16_c one isn't important since we have sse2 and even mmxext versions. But you're right the 32_c one remains vectorized, even when targeting <SSE4 cpus, so the patch should be good. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel