On Thu, Jun 3, 2021 at 10:50 AM Diederik de Haas <didi.deb...@cknow.org> wrote: > > On woensdag 20 januari 2021 11:40:26 CEST brainf...@posteo.net wrote: > > hardware accelerated encryption is a bit of a mystery to me > > some processors advertise it but how do we know if it's being used > > is there a way to test if hardware accelerated encryption is being used > > or if it's just advertising hype > > I very much like to understand this as well. > I have a/several Rock64 devices and it is supposed to have ARMv8 Cryptography > Extensions according to https://wiki.pine64.org/wiki/ROCK64#CPU_Architecture. > > Due to bug #976635 several CRYPTO modules got enabled in the 5.10 kernel. > But I don't know whether that's relevant for ARMv8 CE. > > https://turecki.net/content/getting-most-out-ssh-hardware-acceleration-tuning-aes-ni > contains a test to check the speed of some crypto operations. > Based on that I've made a procedure which I've now run on several devices: > > # adduser test > $ ssh-add (make sure ssh agent is running) > $ ssh-copy-id test@localhost > $ ssh test@localhost (verify key based auth works) > $ exit > $ for i in `ssh -Q cipher`; do dd if=/dev/zero bs=1M count=100 2> /dev/null | > \ > ssh -c $i test@localhost "(time -p cat) > /dev/null" 2>&1 | grep real | \ > awk '{print "'$i': "100 / $2" MB/s" }'; done > $ grep -i -E "(flags|features)" /proc/cpuinfo | tail -n1 > > On a Rock64 with kernel 5.8.0-1-arm64, I got these results: > aes128-ctr: 45.8716 MB/s > aes192-ctr: 45.6621 MB/s > aes256-ctr: 44.6429 MB/s > aes128-...@openssh.com: 49.505 MB/s > aes256-...@openssh.com: 48.7805 MB/s > chacha20-poly1...@openssh.com: 36.9004 MB/s > > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid > > But on kernel 5.10.0-7-arm64, with those CRYPTO modules, I got this: > aes128-ctr: 42,735 MB/s > aes192-ctr: 44,4444 MB/s > aes256-ctr: 44,0529 MB/s > aes128-...@openssh.com: 48,0769 MB/s > aes256-...@openssh.com: 46,0829 MB/s > chacha20-poly1...@openssh.com: 37,037 MB/s > > Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid > > If you run the test several times, you'll get slightly different results > each time, so I consider these results the same. > > For comparison (I don't remember which kernel version) on Ryzen 7 1800X: > aes128-ctr: 714.286 MB/s > aes192-ctr: 714.286 MB/s > aes256-ctr: 769.231 MB/s > aes128-...@openssh.com: 1000 MB/s > aes256-...@openssh.com: 1000 MB/s > chacha20-poly1...@openssh.com: 294.118 MB/s > > flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca > cmov pat > pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp > lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni > pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx > f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse > 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext > perfctr_llc mwaitx cpb hw_pstate sme ssbd sev ibpb vmmcall fsgsbase bmi1 > avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 > xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale > vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic > v_vmsave_vmload vgif overflow_recov succor smca > > with kernel 5.10.0-7-amd64: > aes128-ctr: 714,286 MB/s > aes192-ctr: 769,231 MB/s > aes256-ctr: 714,286 MB/s > aes128-...@openssh.com: 909,091 MB/s > aes256-...@openssh.com: 909,091 MB/s > chacha20-poly1...@openssh.com: 500 MB/s > > very odd that aes192-ctr and aes256-ctr seem to have switched, but the values > are otherwise EXACTLY the same :-O > Very impressive speed improvement with chacha20-poly1305 though :D > (Note that the aforementioned bug report was about arm64, not amd64) > > On a RPi2, the values were around 12 MB/s > > > I don't find the scores of the Rock64 impressive, but that may be because > I've read somewhere that ARMv8 Cryptography Extensions could/should > result in a FACTOR 10 speed improvements with cryptography. > > There could be a number of issues here: > 1) The 'factor 10' is horseshit > 2) The 'factor 10' is true, but it doesn't work on Rock64 (yet?) > 3) The 'factor 10' is true and working and without it, the scores would be > abysmal. > 4) The test is all wrong > > If I do 'cat /proc/crypto' I get a long list, but I have no idea what the > output means. > > > So essentially I have the same question as OP. > How can I/we know if it's present and working as intended? > What kind of speed improvement can/should one expect? > What is needed to take advantage of it? Kernel modules and if so, which? > The CRYPTO_XYZ_CE ones? Others? Something else entirely?
I _think_ OpenSSH uses OpenSSL, not kernel crypto. Or they use that LibreSSL port of OpenSSL. To benchmark OpenSSL, you use something like: # C implementation openssl speed aes-128-cbc # Hardware acceleration openssl speed -evp aes-128-cbc You can see the difference in the numbers below. Below, I'm on a Core i7-8700. $ openssl speed aes-128-cbc Doing aes-128 cbc for 3s on 16 size blocks: 57736814 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 14943316 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 256 size blocks: 3741357 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 944345 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 118246 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 16384 size blocks: 59132 aes-128 cbc's in 3.00s OpenSSL 1.1.1f 31 Mar 2020 built on: Wed Apr 28 00:37:28 2021 UTC ... The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128 cbc 307929.67k 318790.74k 319262.46k 322336.43k 322890.41k 322939.56k $ openssl speed -evp aes-128-cbc Doing aes-128-cbc for 3s on 16 size blocks: 186837731 aes-128-cbc's in 2.99s Doing aes-128-cbc for 3s on 64 size blocks: 78857865 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 256 size blocks: 20276035 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 1024 size blocks: 5088201 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 8192 size blocks: 636732 aes-128-cbc's in 3.00s Doing aes-128-cbc for 3s on 16384 size blocks: 318374 aes-128-cbc's in 3.00s OpenSSL 1.1.1f 31 Mar 2020 built on: Wed Apr 28 00:37:28 2021 UTC ... The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 999800.57k 1682301.12k 1730221.65k 1736772.61k 1738702.85k 1738746.54k I don't like OpenSSL output. They should provide Cycle-per-byte (cpb) since it is mostly independent as a metric when measuring performance. Jeff