On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev <ikalvac...@gmail.com> wrote: > +%define HADDPS_IS_FAST 0 > +%define PHADDD_IS_FAST 0 [...] > + haddps %1, %1 > + haddps %1, %1 [...] > + phaddd xmm%1,xmm%1 > + phaddd xmm%1,xmm%1
You can safely assume that those instructions are always slow and that this is virtually never the correct way to use them, so just use the shuffle + add method. You can unconditionally use non-destructive 3-arg instructions (without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc abstraction layer will automatically insert register-register moves as needed. I'm a bit doubtful if it's worth the complexity to emulate 256-bit integer math using floating-point instruction hacks, especially since that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's probably fine to simply require AVX2 if you need 256-bit integer SIMD. Be aware that most SSE SIMD instructions are actually implemented as x86inc macros and redefining them can have unexpected consequences and is therefore discouraged. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel