https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93243
Bug ID: 93243
Summary: misoptimization: minor changes of the code leads
change up to +/- 30% performance on x86_64, -Os faster
than -Ofast/O2/O3
Product: gcc
Version: 9.2.1
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: leo at yuriev dot ru
Target Milestone: ---
Briefly:
./heapsort-bench, cc 9.2.1 20191008
pass 1, small:
1.138047 seconds, baseline
1.090476 seconds, case-1, 95.8% of baseline
0.957207 seconds, case-2, 84.1% of baseline
1.163323 seconds, case-1+2, 102.2% of baseline
pass 1, large:
2.766881 seconds, baseline
2.677642 seconds, case-1, 96.8% of baseline
3.230149 seconds, case-2, 116.7% of baseline
2.758408 seconds, case-1+2, 99.7% of baseline
./heapsort-bench, cc Clang 9.0.0 (tags/RELEASE_900/final)
pass 1, small:
1.048489 seconds, baseline
1.050220 seconds, case-1, 100.2% of baseline
1.056953 seconds, case-2, 100.8% of baseline
1.050501 seconds, case-1+2, 100.2% of baseline
pass 1, large:
2.588565 seconds, baseline
2.585488 seconds, case-1, 99.9% of baseline
2.610508 seconds, case-2, 100.8% of baseline
2.587282 seconds, case-1+2, 100.0% of baseline
./heapsort-bench, gcc 7.4.0 (ubuntu)
pass 1, small:
0.893917 seconds, baseline
1.135796 seconds, case-1, 127.1% of baseline
0.920338 seconds, case-2, 103.0% of baseline
1.140505 seconds, case-1+2, 127.6% of baseline
pass 1, large:
3.804271 seconds, baseline
2.955773 seconds, case-1, 77.7% of baseline
3.908621 seconds, case-2, 102.7% of baseline
2.925845 seconds, case-1+2, 76.9% of baseline
The diffs in the source code are:
#if CASE & 1
#define CMP(a, b) ((a) < (b))
#else
#define CMP(a, b) (((a) - (b)) < 0)
#endiF
#if CASE & 2
for (size_t root = from; (root + root) <= to;) {
size_t child = root << 1;
#else
for (size_t child, root = from; (child = root + root) <= to;) {
#endif
gcc 9.x and clang 9.x shows (nearly) the same results on Fedora 31 and Ubunto
19.10.
gcc 7.4 probed only on ubuntu, moreover clang 6.0 shown stable results like
clang 9.
Source code of testcase at https://github.com/leo-yuriev/gcc-issues
$ wc heapsort.c
165 528 4309 heapsort.c
Using PGO (included in the testcase) does not significantly change the result.
Basically these words is seems enough, but more ones I will add tomorrow
(likely after afternoon UTC+03).
Regards,
Leonid.