On Wed, Mar 3, 2021 at 9:44 AM LinAdmin <linad...@quickline.ch> wrote: > > The common believe that on the same hardware 64-bit must be better or equal > to 32-bit is clearly wrong for the "crazy" BCM2711 chip used in Pi4. > The detailed benchmarks for Raspian Buster are at 32 Bit Kernel 4.19 and 64 > Bit Kernel 5.4. showing for calculation AES 16KB 50% less throughput for > 64-bit.
This is a user space microbenchmark, it has nothing to do with what the kernel does underneath it. Looking at the output, I see it's not even running the same version of the program: Test on 32-bit kernel: OpenSSL 1.1.1c, built on 28 May 2019 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 62184.51k 76615.98k 83103.15k 84435.97k 85237.76k 85169.49k aes-128-cbc 62511.68k 76704.43k 83097.09k 84763.99k 85150.38k 85229.57k aes-192-cbc 50203.94k 64933.31k 71396.52k 73090.39k 73602.39k 73706.15k aes-192-cbc 56285.24k 67498.65k 71976.02k 73356.29k 73525.93k 73258.33k aes-256-cbc 51010.29k 60062.42k 63579.31k 64656.73k 64927.06k 64831.49k aes-256-cbc 50869.32k 60057.64k 63678.55k 64560.47k 64935.25k 64891.56k Test on 64-bit kernel: OpenSSL 1.1.1d, built on 10 Sep 2019 type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 38070.54k 40669.85k 41716.22k 42029.40k 42131.46k 42177.88k aes-128-cbc 38065.38k 40746.26k 41775.96k 42064.21k 42229.76k 42292.57k aes-192-cbc 32294.31k 34105.22k 35048.28k 35303.42k 35351.21k 35351.21k aes-192-cbc 32254.74k 34136.98k 35043.33k 35301.38k 35367.59k 35367.59k aes-256-cbc 27986.06k 29351.96k 29962.33k 30127.79k 30173.87k 30179.33k aes-256-cbc 27986.74k 29372.25k 29969.24k 30119.25k 30160.21k 30157.48k > On my system I get similar results e.g. for AES-128 (16KB): > Salsa Buster arm64 5.9.0 42'000 > Ubuntu LTS armv7l 5.4 92'000 Do you mean you are running the openssl benchmarks from two different distros here? Could it be that you are running a 64-bit openssl binary on the Buster arm64 kernel? If you want to compare the kernel performance, you have to ensure that you are running the exact same user space on both. For the openssl test, it should be sufficient to boot the Buster installation and enter a chroot. As you can see in the two listings you sent, the 32-bit version reports the 'neon' feature, while the 64-bit version reports 'asimd', which is what 64-bit user space expects, so either those tests are running 64-bit user space, or the 32-bit user space is running on the wrong 'personality' of the kernel. It's possible that the feature detection in openssl fails when you run in the wrong personality, as the /proc/cpuinfo output will contain incompatible information. When you use 'sudo linux32 chroot /mnt/ubuntu-armv7' to enter the chroot, that chroot should be in the correct personality. > When playing a FullHD video coded H265, the average CPU load is 80% on 64-bit > and > less than 30% on 32-bit! > Similar situations when encoding to H265 using > ffmpeg . This could be the same problem with incorrect feature detection from running the wrong personality, or it could be related to missing kernel drivers for H265 acceleration in the 64-bit kernel. Do you know if this uses a software codec or an accelerated version in the GPU? Arnd