On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev <ikalvac...@gmail.com> wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
[...]
> +        haddps      %1,   %1
> +        haddps      %1,   %1
[...]
> +       phaddd       xmm%1,xmm%1
> +       phaddd       xmm%1,xmm%1

You can safely assume that those instructions are always slow and that
this is virtually never the correct way to use them, so just use the
shuffle + add method.

You can unconditionally use non-destructive 3-arg instructions
(without v-prefix) in non AVX-code to reduce ifdeffery. The x86inc
abstraction layer will automatically insert register-register moves as
needed.

I'm a bit doubtful if it's worth the complexity to emulate 256-bit
integer math using floating-point instruction hacks, especially since
that's only relevant on two 5+ year old Intel µarchs (SNB & IVB). It's
probably fine to simply require AVX2 if you need 256-bit integer SIMD.

Be aware that most SSE SIMD instructions are actually implemented as
x86inc macros and redefining them can have unexpected consequences and
is therefore discouraged.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to