On Sun, Feb 03, 2019 at 08:07:22AM -0800, H.J. Lu wrote:
> + /* If the misalignment of __P > 8, subtract __P by 8 bytes.
> + Otherwise, subtract __P by the misalignment. */
> + if (offset > 8)
> + offset = 8;
> + __P = (char *) (((__SIZE_TYPE__) __P) - offset);
> +
> + /* Zero-extend __A and __N to 128 bits and shift right by the
> + adjustment. */
> + unsigned __int128 __a128 = ((__v1di) __A)[0];
> + unsigned __int128 __n128 = ((__v1di) __N)[0];
> + __a128 <<= offset * 8;
> + __n128 <<= offset * 8;
> + __A128 = __extension__ (__v2di) { __a128, __a128 >> 64 };
> + __N128 = __extension__ (__v2di) { __n128, __n128 >> 64 };
We have _mm_slli_si128/__builtin_ia32_pslldqi128, why can't you use that
instead of doing the arithmetics in unsigned __int128 scalars?
Jakub