I changed the asm a bit and made it about 1 cycle faster on Haswell
and slightly smaller (-48 bytes overall incl. alignment on 64-bit
Linux).
%macro AES_CRYPT 1
cglobal aes_%1rypt, 6,6,2
shl r3d, 4
add r5d, r5d
add r0, 0x60
add r2, r3
add r1, r3
On Tue, Oct 13, 2015 at 2:33 AM, Rodger Combs wrote:
> +%macro AES_CRYPT 1
> +%if %1 == 1
> +%define CRYPT aesdec
> +%define LAST aesdeclast
> +cglobal aes_decrypt, 6,6,2
> +%else
> +%define CRYPT aesenc
> +%define LAST aesenclast
> +cglobal aes_encrypt, 6,6,2
> +%endif
> +pxor xm1, xm1
> +
---
libavutil/aes.c | 4 ++
libavutil/aes_internal.h | 2 +
libavutil/x86/Makefile | 4 +-
libavutil/x86/aes.asm| 98
libavutil/x86/aes_init.c | 37 ++
5 files changed, 144 insertions(+), 1 deletion(-)
create mode