On 5/30/23 06:52, Ard Biesheuvel wrote:
+#ifdef __x86_64__
+ if (have_aes()) {
+ __m128i *d = (__m128i *)rd;
+
+ *d = decrypt ? _mm_aesdeclast_si128(rk.vec ^ st.vec, (__m128i){})
+ : _mm_aesenclast_si128(rk.vec ^ st.vec, (__m128i){});
Do I correctly understand that the ARM xor is pre-shift
+ return;
+ }
+#endif
+
/* xor state vector with round key */
rk.l[0] ^= st.l[0];
rk.l[1] ^= st.l[1];
(like so)
whereas the x86 xor is post-shift
void glue(helper_aesenclast, SUFFIX)(CPUX86State *env, Reg *d, Reg *v, Reg *s)
{
int i;
Reg st = *v;
Reg rk = *s;
for (i = 0; i < 8 << SHIFT; i++) {
d->B(i) = rk.B(i) ^ (AES_sbox[st.B(AES_shifts[i & 15] + (i & ~15))]);
}
(like so, from target/i386/ops_sse.h)?
What might help: could we do the reverse -- emulate the x86 aesdeclast instruction with
the aarch64 aesd instruction?
r~