On Sun, Feb 03, 2019 at 08:07:22AM -0800, H.J. Lu wrote: > + /* If the misalignment of __P > 8, subtract __P by 8 bytes. > + Otherwise, subtract __P by the misalignment. */ > + if (offset > 8) > + offset = 8; > + __P = (char *) (((__SIZE_TYPE__) __P) - offset); > + > + /* Zero-extend __A and __N to 128 bits and shift right by the > + adjustment. */ > + unsigned __int128 __a128 = ((__v1di) __A)[0]; > + unsigned __int128 __n128 = ((__v1di) __N)[0]; > + __a128 <<= offset * 8; > + __n128 <<= offset * 8; > + __A128 = __extension__ (__v2di) { __a128, __a128 >> 64 }; > + __N128 = __extension__ (__v2di) { __n128, __n128 >> 64 };
We have _mm_slli_si128/__builtin_ia32_pslldqi128, why can't you use that instead of doing the arithmetics in unsigned __int128 scalars? Jakub