On Wed, Dec 19, 2018 at 5:08 AM Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen <a...@linux.intel.com> wrote: > > > > On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote: > > > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng <amker.ch...@gmail.com> wrote: > > > > > > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen <a...@linux.intel.com> wrote: > > > > > > > > > > "bin.cheng" <bin.ch...@linux.alibaba.com> writes: > > > > > > > > > > > Hi, > > > > > > > > > > > > Due to ICE and mal-functional bugs, indirect call value profile > > > > > > transformation > > > > > > is disabled on GCC-7/8/trunk. This patch restores the > > > > > > transformation. The > > > > > > main issue is AutoFDO should store cgraph_node's profile_id of > > > > > > callee func in > > > > > > the first histogram value's counter, rather than pointer to > > > > > > callee's name string > > > > > > as it is now. > > > > > > With the patch, some "Indirect call -> direct call" tests pass with > > > > > > autofdo, while > > > > > > others are unstable. I think the instability is caused by poor > > > > > > perf data collected > > > > > > during regrets run, and can confirm these tests pass if good perf > > > > > > data could be > > > > > > collected in manual experiments. > > > > > > > > > > Would be good to make the tests stable, otherwise we'll just have > > > > > regressions in the future again. > > > > > > > > > > The problem is that the tests don't run long enough and don't get > > > > > enough samples? > > > > Yes, take g++.dg/tree-prof/morefunc.C as an example: > > > > - int i; > > > > - for (i = 0; i < 1000; i++) > > > > + int i, j; > > > > + for (i = 0; i < 1000000; i++) > > > > + for (j = 0; j < 50; j++) > > > > g += tc->foo(); > > > > if (g<100) g++; > > > > } > > > > @@ -27,8 +28,9 @@ void test1 (A *tc) > > > > static __attribute__((always_inline)) > > > > void test2 (B *tc) > > > > { > > > > - int i; > > > > + int i, j; > > > > for (i = 0; i < 1000000; i++) > > > > + for (j = 0; j < 50; j++) > > > > > > > > I have to increase loop count like this to get stable pass on my > > > > machine. The original count (1000) is too small to be sampled. > > > > > > > > > > > > > > Could add some loop? > > > > > Or possibly increase the sampling frequency in perf (-F or -c)? > > > > Maybe, I will have a try. > > > Turned out all "Indirect call" test can be resolved by adding -c 100 > > > to perf command line: > > > diff --git a/gcc/config/i386/gcc-auto-profile > > > b/gcc/config/i386/gcc-auto-profile > > > ... > > > -exec perf record -e $E -b "$@" > > > +exec perf record -e $E -c 100 -b "$@" > > > > > > Is 100 too small here? Or is it fine for all scenarios? > > > > -c 100 is risky because it can cause perf throttling, which > > makes it lose data. > Right, it looks suspicious to me too. > > > > > perf has a limiter that if the PMU handler uses too much CPU > > time it stops measuring for some time. A PMI is 10k+ cycles, > > so doing one every 100 branches is a lot of CPU time. > > > > I wouldn't go down that low. It is better to increase the > > iteration count. > We can combine the two together, increasing iteration count and > decreasing perf count at the same time. What count would you suggest > from your experience?
Can we instead for the tests where we want to test profile use/merge elide the profiling step and supply the "raw" data in an testsuite alternate file instead? Richard. > Thanks, > bin > > > > -Andi