On Sun, Feb 03, 2019 at 08:07:22AM -0800, H.J. Lu wrote:
> +      /* If the misalignment of __P > 8, subtract __P by 8 bytes.
> +      Otherwise, subtract __P by the misalignment.  */
> +      if (offset > 8)
> +     offset = 8;
> +      __P = (char *) (((__SIZE_TYPE__) __P) - offset);
> +
> +      /* Zero-extend __A and __N to 128 bits and shift right by the
> +      adjustment.  */
> +      unsigned __int128 __a128 = ((__v1di) __A)[0];
> +      unsigned __int128 __n128 = ((__v1di) __N)[0];
> +      __a128 <<= offset * 8;
> +      __n128 <<= offset * 8;
> +      __A128 = __extension__ (__v2di) { __a128, __a128 >> 64 };
> +      __N128 = __extension__ (__v2di) { __n128, __n128 >> 64 };

We have _mm_slli_si128/__builtin_ia32_pslldqi128, why can't you use that
instead of doing the arithmetics in unsigned __int128 scalars?

        Jakub

Reply via email to