I've been having consistent performance problems with the 64 bit openssl FIPS 
1.2.3 with asm on AES.  The assembly code on 64 bit architectures is much 
slower than without assembly.  Running the same tests on a 32 bit machine 
results with ASM being faster than no-asm, which is expected.

Does anyone have any ideas on why the 64 bit fips with asm is slower for AES 
encryption?

machine:  OpenSUSE 11.4, using gcc 4.1.2

With ASM:
./Configure fipscanisterbuild linux-x86_64 no-shared
./openssl64-asm speed aes
Doing aes-128 cbc for 3s on 16 size blocks: 12313812 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 3315028 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 256 size blocks: 847171 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 213565 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 26741 aes-128 cbc's in 2.99s
Doing aes-192 cbc for 3s on 16 size blocks: 10425560 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 64 size blocks: 2783434 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 256 size blocks: 703868 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 1024 size blocks: 177705 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 8192 size blocks: 22269 aes-192 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 9044428 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 2400974 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 256 size blocks: 609267 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 1024 size blocks: 152920 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 19111 aes-256 cbc's in 2.99s
OpenSSL FIPS Object Module v1.2
built on: Wed Nov  2 13:54:41 PDT 2011
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) 
idea(int) blowfish(ptr2)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64 
-DL_ENDIAN -DTERMIO -O3 -Wall -DMD32_REG_T=int -DOPENSSL_BN_ASM_MONT -DSHA1_ASM 
-DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc      65893.31k    70957.12k    72291.93k    73140.66k    73264.97k
aes-192 cbc      55788.95k    59379.93k    60264.28k    60859.51k    60809.22k
aes-256 cbc      48398.28k    51392.09k    51990.78k    52371.26k    52360.31k

Without ASM:
./Configure fipscanisterbuild linux-x86_64 no-shared no-asm
./openssl64-no-asm speed aes
Doing aes-128 cbc for 3s on 16 size blocks: 23150575 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 5947419 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 256 size blocks: 1515978 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 1024 size blocks: 379077 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 47520 aes-128 cbc's in 2.99s
Doing aes-192 cbc for 3s on 16 size blocks: 20160858 aes-192 cbc's in 3.00s
Doing aes-192 cbc for 3s on 64 size blocks: 5197254 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 256 size blocks: 1325367 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 1024 size blocks: 331725 aes-192 cbc's in 2.99s
Doing aes-192 cbc for 3s on 8192 size blocks: 41579 aes-192 cbc's in 3.00s
Doing aes-256 cbc for 3s on 16 size blocks: 18043804 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 64 size blocks: 4656304 aes-256 cbc's in 3.00s
Doing aes-256 cbc for 3s on 256 size blocks: 1171525 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 1024 size blocks: 292807 aes-256 cbc's in 2.99s
Doing aes-256 cbc for 3s on 8192 size blocks: 36675 aes-256 cbc's in 3.00s
OpenSSL FIPS Object Module v1.2
built on: Wed Nov  2 13:58:02 PDT 2011
options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) 
idea(int) blowfish(ptr2)
compiler: gcc -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -m64  
-DTERMIO -O3 -Wall -DMD32_REG_T=int
available timing options: TIMES TIMEB HZ=100 [sysconf value]
timing function used: times
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128 cbc     123882.68k   127302.61k   129796.11k   129391.62k   130195.26k
aes-192 cbc     107524.58k   111245.57k   113476.24k   113607.49k   113538.39k
aes-256 cbc      96555.47k    99334.49k   100304.48k   100279.05k   100147.20k

Thanks,
Mark

Reply via email to