Hi,
I've been looking at GCC's profile information recently. I noticed that the
probabilities/counts effectively encode the expected niters of a loop, so e.g.
with the following testcase:
$ cat t.c
int *a;
void f() {
for (int i = 0; i < N; i++)
a[i] = i;
}
$ ./xgcc -B . -c -S -o /dev/null t.c -DN=32 -O2 -fno-tree-vectorize
-fdump-tree-optimized=-
;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1)
Removing basic block 5
void f ()
{
unsigned long ivtmp.6;
int i;
int * a.0_1;
[local count: 32534376]:
a.0_1 = a;
[local count: 1041207448]:
# ivtmp.6_11 = PHI
i_12 = (int) ivtmp.6_11;
MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12;
ivtmp.6_10 = ivtmp.6_11 + 1;
if (ivtmp.6_10 != 32)
goto ; [96.88%]
else
goto ; [3.12%]
[local count: 32534376]:
return;
}
the counts effectively encode that the loop executes 32 times, since the count
of the header (bb 3) is 32 times the count of the preheader (bb 2):
$ python -c 'print(1041207448/32534376)'
32.003301615497406
however, I noticed that for larger values of N, the implied niters caps out at
99. E.g. for N=128:
$ ./xgcc -B . -c t.c -S -o /dev/null -DN=128 -O2 -fno-tree-vectorize
-fdump-tree-optimized=-
;; Function f (f, funcdef_no=0, decl_uid=4417, cgraph_uid=1, symbol_order=1)
Removing basic block 5
void f ()
{
unsigned long ivtmp.6;
int i;
int * a.0_1;
[local count: 10737416]:
a.0_1 = a;
[local count: 1063004408]:
# ivtmp.6_11 = PHI
i_12 = (int) ivtmp.6_11;
MEM[(int *)a.0_1 + ivtmp.6_11 * 4] = i_12;
ivtmp.6_10 = ivtmp.6_11 + 1;
if (ivtmp.6_10 != 128)
goto ; [98.99%]
else
goto ; [1.01%]
[local count: 10737416]:
return;
}
here the ratio of the counts is:
$ python -c 'print(1063004408/10737416)'
99.2086163002
I wrote a script to try the same reproducer with various values of N,
and that shows the following (in the format "N : implied niters"):
$ ./profile_niters.py
90 : 89.9090913170256
91 : 90.7431340154603
92 : 91.59262212919725
93 : 93.3396171245259
94 : 94.2381049694668
95 : 95.15387181344315
96 : 96.08735052729118
97 : 97.03920319702912
98 : 98.00990484649222
99 : 99.2086163002
100 : 99.2086163002
101 : 99.2086163002
102 : 99.2086163002
103 : 99.2086163002
104 : 99.2086163002
105 : 99.2086163002
106 : 99.2086163002
107 : 99.2086163002
108 : 99.2086163002
I was just wondering if anyone knows what the rationale for this behaviour is,
and can point to the relevant bit in the source?
I can imagine it might be something to do with floating-point precision
and not wanting probabilities to get too small or large, but I'm not
sure.
Thanks,
Alex