Vladimir Makarov wrote: > I've tested and benchmarked the sub-pass on x86-64 and ARM. The > sub-pass permits to generate a smaller code in average on both > architecture (although improvement no-significant), adds < 0.4% > additional compilation time in -O2 mode of release GCC (according user > time of compilation of 500K lines fortran program and valgrind lakey # > insns in combine.i compilation) and about 0.7% in -O0 mode. As the > performance result, the best I found is 1% SPECFP2000 improvement on > ARM Ecynos 5410 (973 vs 963) but for Intel Haswell the performance > results are practically the same (Haswell has a very good > sophisticated memory sub-system).
On aarch64 I have seen some minor perf improvements to libpng compress and decompress. The patch does not change the perf for all other benchmarks that I have tested. Thanks, Sebastian