In xalancbmk, with the partition option, most of object files have
nonzero size cold sections generated. The text size of the binary is
increased to 3572728 bytes from 3466790 bytes. Profiling the program
using the training input shows the following differences. With
partitioning, number of executed branch instructions slightly
increases, but itlb misses and icache load misses are significantly
lower compared with the binary without partitioning.
David
With partition:
-----------------
53654937239 branches
306751458 L1-icache-load-misses
8146112 iTLB-load-misses
Without partition:
---------------------
52348639025 branches
454417666 L1-icache-load-misses
14470953 iTLB-load-misses
On Mon, Jul 25, 2011 at 3:23 AM, Paolo Bonzini <[email protected]> wrote:
> On 07/25/2011 06:42 AM, Xinliang David Li wrote:
>>
>> FYI the performance impact of this option with SPEC06 (built with
>> google_46 compiler and measured on a core2 box). The base line number
>> is FDO, and ref number is FDO + reorder_with_partitioning.
>>
>> xalancbmk improves> 3.5%
>> perlbench improves> 1.5%
>> dealII and bzip2 degrades about 1.4%.
>>
>> Note the partitioning scheme is not tuned at all -- there is not even
>> a tunable parameter to play with.
>
> Did you check what is pushed down to the cold section in these cases?
>
> Paolo
>