On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <a...@linux.intel.com> wrote: > > > > "bin.cheng" <bin.ch...@linux.alibaba.com> writes: > > > > > Hi, > > > > > > Due to ICE and mal-functional bugs, indirect call value profile > > > transformation > > > is disabled on GCC-7/8/trunk. This patch restores the transformation. > > > The > > > main issue is AutoFDO should store cgraph_node's profile_id of callee > > > func in > > > the first histogram value's counter, rather than pointer to callee's name > > > string > > > as it is now. > > > With the patch, some "Indirect call -> direct call" tests pass with > > > autofdo, while > > > others are unstable. I think the instability is caused by poor perf data > > > collected > > > during regrets run, and can confirm these tests pass if good perf data > > > could be > > > collected in manual experiments. > > > > Would be good to make the tests stable, otherwise we'll just have > > regressions in the future again. > > > > The problem is that the tests don't run long enough and don't get enough > > samples? > Yes, take g++.dg/tree-prof/morefunc.C as an example: > - int i; > - for (i = 0; i < 1000; i++) > + int i, j; > + for (i = 0; i < 1000000; i++) > + for (j = 0; j < 50; j++) > g += tc->foo(); > if (g<100) g++; > } > @@ -27,8 +28,9 @@ void test1 (A *tc) > static __attribute__((always_inline)) > void test2 (B *tc) > { > - int i; > + int i, j; > for (i = 0; i < 1000000; i++) > + for (j = 0; j < 50; j++) > > I have to increase loop count like this to get stable pass on my > machine. The original count (1000) is too small to be sampled. > > > > > Could add some loop? > > Or possibly increase the sampling frequency in perf (-F or -c)? > Maybe, I will have a try. Turned out all "Indirect call" test can be resolved by adding -c 100 to perf command line: diff --git a/gcc/config/i386/gcc-auto-profile b/gcc/config/i386/gcc-auto-profile ... -exec perf record -e $E -b "$@" +exec perf record -e $E -c 100 -b "$@"
Is 100 too small here? Or is it fine for all scenarios? Thanks, bin > > Or run them multiple times and use gcov_merge to merge the files? > Without changing loop count or sampling frequency, this is not likely > to be helpful, since perf doesn't hit the small loop in most cases. > > Thanks, > bin > > > > > > > FYI, an update about AutoFDO status: > > > All AutoFDO ICEs in regtest are fixed, while several tests still failing > > > fall in below > > > three categories: > > > > Great! > > > > Of course it still ICEs with LTO? > > > > Right now there is no test case for this I think. Probably one should be > > added. > > > > -Andi