I've been having consistent performance problems with the 64 bit openssl FIPS 1.2.3 with asm on AES. The assembly code on 64 bit architectures is much slower than without assembly. Running the same tests on a 32 bit machine results with ASM being faster than no-asm, which is expected.
Does anyone have any ideas on why the 64 bit fips with asm is slower for AES encryption? machine: OpenSUSE 11.4, using gcc 4.1.2 With ASM: ./Configure fipscanisterbuild linux-x86_64 no-shared ./openssl64-asm speed aes Doing aes-128 cbc for 3s on 16 size blocks: 12313812 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 64 size blocks: 3315028 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 847171 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 213565 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 8192 size blocks: 26741 aes-128 cbc's in 2.99s Doing aes-192 cbc for 3s on 16 size blocks: 10425560 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 64 size blocks: 2783434 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 256 size blocks: 703868 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 1024 size blocks: 177705 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 8192 size blocks: 22269 aes-192 cbc's in 3.00s Doing aes-256 cbc for 3s on 16 size blocks: 9044428 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 64 size blocks: 2400974 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 256 size blocks: 609267 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 1024 size blocks: 152920 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 8192 size blocks: 19111 aes-256 cbc's in 2.99s OpenSSL FIPS Object Module v1.2 built on: Wed Nov 2 13:54:41 PDT 2011 options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(ptr2) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 65893.31k 70957.12k 72291.93k 73140.66k 73264.97k aes-192 cbc 55788.95k 59379.93k 60264.28k 60859.51k 60809.22k aes-256 cbc 48398.28k 51392.09k 51990.78k 52371.26k 52360.31k Without ASM: ./Configure fipscanisterbuild linux-x86_64 no-shared no-asm ./openssl64-no-asm speed aes Doing aes-128 cbc for 3s on 16 size blocks: 23150575 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 64 size blocks: 5947419 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 256 size blocks: 1515978 aes-128 cbc's in 2.99s Doing aes-128 cbc for 3s on 1024 size blocks: 379077 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 47520 aes-128 cbc's in 2.99s Doing aes-192 cbc for 3s on 16 size blocks: 20160858 aes-192 cbc's in 3.00s Doing aes-192 cbc for 3s on 64 size blocks: 5197254 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 256 size blocks: 1325367 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 1024 size blocks: 331725 aes-192 cbc's in 2.99s Doing aes-192 cbc for 3s on 8192 size blocks: 41579 aes-192 cbc's in 3.00s Doing aes-256 cbc for 3s on 16 size blocks: 18043804 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 64 size blocks: 4656304 aes-256 cbc's in 3.00s Doing aes-256 cbc for 3s on 256 size blocks: 1171525 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 1024 size blocks: 292807 aes-256 cbc's in 2.99s Doing aes-256 cbc for 3s on 8192 size blocks: 36675 aes-256 cbc's in 3.00s OpenSSL FIPS Object Module v1.2 built on: Wed Nov 2 13:58:02 PDT 2011 options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(ptr2) compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 -DTERMIO -O3 -Wall -DMD32_REG_T=int available timing options: TIMES TIMEB HZ=100 [sysconf value] timing function used: times The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 123882.68k 127302.61k 129796.11k 129391.62k 130195.26k aes-192 cbc 107524.58k 111245.57k 113476.24k 113607.49k 113538.39k aes-256 cbc 96555.47k 99334.49k 100304.48k 100279.05k 100147.20k Thanks, Mark