On Mon, 2016-07-04 at 20:55 +0200, Hendrik Leppkes wrote: > On Mon, Jul 4, 2016 at 5:20 PM, Dan Parrot <dan.par...@mail.com> wrote: > >> Why is this not faster? > > Surprisingly, gcc is producing some badly suboptimal assembly. I need to > > follow up with IBM's Linux Technology Center. The major issue is that > > multiplication of vector quantities in C is generating as many > > multiplications in assembly as would scalar multiplication in a loop. No > > way that should be occurring. > > > > This is the reason why we generally don't allow intrinsic > optimizations and instead ask people to write full assembly instead. > It behaves more consistently everywhere.
Is this then a requirement to abandon the use of intrinsics for PPC64 SIMD and instead re-implement in assembly? _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel