Hi Paul,
I cannot see anything wrong with the optimized code and valgrind gives a clean bill of health on x86_64.We need help of somebody with access to an arm/aarch64 device.
I'm currently running a bootstrap on an aarch64 machine. These are not known to be the fastest of machines, but it should be done sometime today. Regards Thomas