Quoting Xinliang David Li <davi...@google.com>:

In xalancbmk, with the partition option, most of object files have
nonzero size cold sections generated. The text size of the binary is
increased to 3572728 bytes from 3466790 bytes.  Profiling the program
using the training input shows the following differences. With
partitioning, number of executed branch instructions slightly
increases, but itlb misses and icache load misses are significantly
lower compared with the binary without partitioning.

It is nice to have confirmation that for this benchmark, the optimization
causes a speedup because it works as intended, however...

dealII and bzip2 degrades about 1.4%.

... I think the question was more directed at what causes the
performance degradation for these two benchmarks.

If we could retain most of the speedups when the optimization works well
but avoid most of the slowdown in the benchmarks that are currently hurt,
we could improve the overall SPEC06 score.  And hopefully, this would
also be beneficial to other code.

Reply via email to