On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote:
> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
> +%if cpuflag(avx2)
> +vbroadcastss %1, %2; ymm, xmm
> +%elif cpuflag(avx)
> +%ifnum sizeof%2 ; avx1 register
> +vpermilps xmm%1, xmm%2, q
Improved version of VBROADCASTSS that works like the avx2 instruction.
Emulation of vpbroadcastd.
Horizontal sum HSUMPS that places the result in all elements.
Emulation of blendvps and pblendvb.
From cf4dc8fcd974a845b91aaa8685c06fa145b01786 Mon Sep 17 00:00:00 2001
From: Ivan Kalvachev
Date: Sat,