On 28-10-17, Baptiste Jonglez wrote: > The awesome AES performance was too good to be true: it seems to produce > incorrect results when encrypting on the pine64 and decrypting on a x86_64 > machine :( > Possibly some assembler is optimized away by the compiler, which would > explain why it's so fast. Please don't merge for now until I investigate.
After investigating, there is actually no issue, so this is good to merge! For the details: - I was using openssl 1.1 for encrypting and openssl 1.0 for decrypting, so I was bitten by https://www.openssl.org/docs/faq.html#USER3 . Using the same digest algorithm on both sides yields correct results. - AES performance is so good because openssl exploits the dedicated hardware instructions for AES found in most Aarch64 CPUs. Support for this was introduced 3 years ago: https://github.com/openssl/openssl/commit/9af4cb3d3beaaed8af33ee0bbc547cfef49c88a6 Baptiste > On 27-10-17, Baptiste Jonglez wrote: > > OpenSSL is built with the generic linux settings for most targets, > > including aarch64. These generic settings are designed for 32-bit CPU and > > provide no assembler optmization: this is widely suboptimal for aarch64. > > > > This patch simply switches to the aarch64 settings that are already > > available in OpenSSL. > > > > Here is the output of "openssl speed" before the optimization, with > > "(...)" representing build flags that didn't change: > > > > OpenSSL 1.0.2l 25 May 2017 > > options:bn(64,32) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) > > blowfish(ptr) > > compiler: aarch64-openwrt-linux-musl-gcc (...) > > > > And after this patch, OpenSSL uses 64 bit mode and assembler optimizations: > > > > OpenSSL 1.0.2l 25 May 2017 > > options:bn(64,64) rc4(ptr,char) des(idx,cisc,2,int) aes(partial) > > blowfish(ptr) > > compiler: aarch64-openwrt-linux-musl-gcc (...) -DSHA1_ASM > > -DSHA256_ASM -DSHA512_ASM > > > > Here are some benchmarks on a pine64+ running latest LEDE master > > r5142-20d363aed3: > > > > before# openssl speed sha aes blowfish > > The 'numbers' are in 1000s of bytes per second processed. > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 > > bytes > > sha1 3918.89k 9982.43k 19148.03k 24933.03k > > 27325.78k > > sha256 4604.51k 10240.64k 17472.51k 21355.18k > > 22801.07k > > sha512 3662.19k 14539.41k 21443.16k 29544.11k > > 33177.60k > > blowfish cbc 16266.63k 16940.86k 17176.92k 17237.33k > > 17252.35k > > aes-128 cbc 19712.95k 21447.40k 22091.09k 22258.35k > > 22304.09k > > aes-192 cbc 17680.12k 19064.47k 19572.14k 19703.13k > > 19737.26k > > aes-256 cbc 15986.67k 17132.48k 17537.28k 17657.17k > > 17689.26k > > > > after# openssl speed sha aes blowfish > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 > > bytes > > sha1 6770.87k 26172.80k 86878.38k 205649.58k > > 345978.20k > > sha256 20913.93k 74663.85k 184658.18k 290891.09k > > 351032.66k > > sha512 7633.10k 30110.14k 50083.24k 71883.43k > > 82485.25k > > blowfish cbc 16224.93k 16933.55k 17173.76k 17234.94k > > 17252.35k > > aes-128 cbc 19425.74k 21193.31k 22065.74k 22304.77k > > 22380.54k > > aes-192 cbc 17452.29k 18883.84k 19536.90k 19741.70k > > 19800.06k > > aes-256 cbc 15815.89k 17003.01k 17530.03k 17695.40k > > 17746.60k > > > > For some reason AES and blowfish do not benefit, but SHA performance > > improves between 1.7x and 15x. SHA256 clearly benefits the most from the > > optimization (4.5x on small blocks, 15x on large blocks!). > > > > When using EVP (with "openssl speed -evp <algo>"): > > > > # Before, EVP mode > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 > > bytes > > sha1 3824.46k 10049.66k 19170.56k 24947.03k > > 27325.78k > > sha256 3368.33k 8511.15k 16061.44k 20772.52k > > 22721.88k > > sha512 2845.23k 11381.57k 19467.69k 28512.26k > > 33008.30k > > bf-cbc 15146.74k 16623.83k 17092.01k 17211.39k > > 17249.62k > > aes-128-cbc 17873.03k 20870.61k 21933.65k 22216.36k > > 22301.35k > > aes-192-cbc 16184.18k 18607.15k 19447.13k 19670.02k > > 19737.26k > > aes-256-cbc 14774.06k 16757.25k 17457.58k 17639.42k > > 17686.53k > > > > # After, EVP mode > > type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 > > bytes > > sha1 7056.97k 27142.10k 89515.86k 209155.41k > > 347419.99k > > sha256 7745.70k 29750.06k 95341.48k 211001.69k > > 332376.75k > > sha512 4550.47k 18086.06k 39997.10k 65880.75k > > 81431.21k > > bf-cbc 15129.20k 16619.03k 17090.56k 17212.76k > > 17246.89k > > aes-128-cbc 99619.74k 269032.34k 450214.23k 567353.00k > > 613933.06k > > aes-192-cbc 93180.74k 231017.79k 361766.66k 433671.51k > > 461731.16k > > aes-256-cbc 89343.23k 209858.58k 310160.04k 362234.88k > > 380878.85k > > > > Blowfish does not seem to have assembler optimization at all, and SHA > > still benefits (between 1.6x and 14.5x) but is generally slower than in > > non-EVP mode. > > > > However, AES performance is improved between 5.5x and 27.5x, which is > > really impressive! For aes-128-cbc on large blocks, a core i7-6600U > > @2.60GHz is only twice as fast... > > > > Signed-off-by: Baptiste Jonglez <g...@bitsofnetworks.org> > > --- > > package/libs/openssl/Makefile | 4 +++- > > package/libs/openssl/patches/110-optimize-for-size.patch | 3 ++- > > 2 files changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/package/libs/openssl/Makefile b/package/libs/openssl/Makefile > > index 7707c19431..d7037cb7c1 100644 > > --- a/package/libs/openssl/Makefile > > +++ b/package/libs/openssl/Makefile > > @@ -11,7 +11,7 @@ PKG_NAME:=openssl > > PKG_BASE:=1.0.2 > > PKG_BUGFIX:=l > > PKG_VERSION:=$(PKG_BASE)$(PKG_BUGFIX) > > -PKG_RELEASE:=1 > > +PKG_RELEASE:=2 > > PKG_USE_MIPS16:=0 > > > > PKG_BUILD_PARALLEL:=0 > > @@ -161,6 +161,8 @@ else > > OPENSSL_OPTIONS+=no-sse2 > > ifeq ($(CONFIG_mips)$(CONFIG_mipsel),y) > > OPENSSL_TARGET:=linux-mips-openwrt > > + else ifeq ($(CONFIG_aarch64),y) > > + OPENSSL_TARGET:=linux-aarch64-openwrt > > else ifeq ($(CONFIG_arm)$(CONFIG_armeb),y) > > OPENSSL_TARGET:=linux-armv4-openwrt > > else > > diff --git a/package/libs/openssl/patches/110-optimize-for-size.patch > > b/package/libs/openssl/patches/110-optimize-for-size.patch > > index 0f174a3469..d6d4a21111 100644 > > --- a/package/libs/openssl/patches/110-optimize-for-size.patch > > +++ b/package/libs/openssl/patches/110-optimize-for-size.patch > > @@ -1,11 +1,12 @@ > > --- a/Configure > > +++ b/Configure > > -@@ -470,6 +470,12 @@ my %table=( > > +@@ -470,6 +470,13 @@ my %table=( > > "linux-alpha-ccc","ccc:-fast -readonly_strings > > -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT DES_PTR > > DES_RISC1 DES_UNROLL:${alpha_asm}", > > "linux-alpha+bwx-ccc","ccc:-fast -readonly_strings > > -DL_ENDIAN::-D_REENTRANT:::SIXTY_FOUR_BIT_LONG RC4_CHAR RC4_CHUNK DES_INT > > DES_PTR DES_RISC1 DES_UNROLL:${alpha_asm}", > > > > +# OpenWrt targets > > +"linux-armv4-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) > > -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK > > DES_INT DES_UNROLL > > BF_PTR:${armv4_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)", > > ++"linux-aarch64-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) > > -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHAR > > RC4_CHUNK DES_INT DES_UNROLL > > BF_PTR:${aarch64_asm}:linux64:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)", > > +"linux-x86_64-openwrt", "gcc:-m64 -DL_ENDIAN -DTERMIOS > > \$(OPENWRT_OPTIMIZATION_FLAGS) -fomit-frame-pointer > > -Wall::-D_REENTRANT::-ldl:SIXTY_FOUR_BIT_LONG RC4_CHUNK DES_INT > > DES_UNROLL:${x86_64_asm}:elf:dlfcn:linux-shared:-fPIC:-m64:.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR):::64", > > +"linux-mips-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) > > -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK > > DES_INT DES_UNROLL > > BF_PTR:${mips32_asm}:o32:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)", > > +"linux-generic-openwrt","gcc:-DTERMIOS \$(OPENWRT_OPTIMIZATION_FLAGS) > > -fomit-frame-pointer -Wall::-D_REENTRANT::-ldl:BN_LLONG RC4_CHAR RC4_CHUNK > > DES_INT DES_UNROLL > > BF_PTR:${no_asm}:dlfcn:linux-shared:-fPIC::.so.\$(SHLIB_MAJOR).\$(SHLIB_MINOR)", > _______________________________________________ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev
signature.asc
Description: PGP signature
_______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev