Quoting Xinliang David Li <davi...@google.com>:
In xalancbmk, with the partition option, most of object files have nonzero size cold sections generated. The text size of the binary is increased to 3572728 bytes from 3466790 bytes. Profiling the program using the training input shows the following differences. With partitioning, number of executed branch instructions slightly increases, but itlb misses and icache load misses are significantly lower compared with the binary without partitioning.
It is nice to have confirmation that for this benchmark, the optimization causes a speedup because it works as intended, however...
dealII and bzip2 degrades about 1.4%.
... I think the question was more directed at what causes the performance degradation for these two benchmarks. If we could retain most of the speedups when the optimization works well but avoid most of the slowdown in the benchmarks that are currently hurt, we could improve the overall SPEC06 score. And hopefully, this would also be beneficial to other code.