I wonder how much of that is due to auto-vectorization (on LLVM, -O2+ turns it on, I suppose GCC is only on -O3?). From Ramana's point, there may be nothing serious if you haven't enabled NEON, though.
Auto-vec is turned off when you have -mfpu=vfpv3-d16 . That implies No Neon. Ramana
Also interesting to see the impact of LTO being a major drive in recent performance improvements on both compilers. cheers, --renato