In xalancbmk, with the partition option, most of object files have nonzero size cold sections generated. The text size of the binary is increased to 3572728 bytes from 3466790 bytes. Profiling the program using the training input shows the following differences. With partitioning, number of executed branch instructions slightly increases, but itlb misses and icache load misses are significantly lower compared with the binary without partitioning.
David With partition: ----------------- 53654937239 branches 306751458 L1-icache-load-misses 8146112 iTLB-load-misses Without partition: --------------------- 52348639025 branches 454417666 L1-icache-load-misses 14470953 iTLB-load-misses On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini <bonz...@gnu.org> wrote: > On 07/25/2011 06:42 AM, Xinliang David Li wrote: >> >> FYI the performance impact of this option with SPEC06 (built with >> google_46 compiler and measured on a core2 box). The base line number >> is FDO, and ref number is FDO + reorder_with_partitioning. >> >> xalancbmk improves> 3.5% >> perlbench improves> 1.5% >> dealII and bzip2 degrades about 1.4%. >> >> Note the partitioning scheme is not tuned at all -- there is not even >> a tunable parameter to play with. > > Did you check what is pushed down to the cold section in these cases? > > Paolo >