[Bug d/102765] New: [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code

siarhei.siamashka at gmail dot com via Gcc-bugs Thu, 14 Oct 2021 23:54:13 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102765


            Bug ID: 102765
           Summary: [11 Regression] GDC11 stopped inlining library
                    functions and lambdas used by a binary search
                    one-liner code
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: d
          Assignee: ibuclaw at gdcproject dot org
          Reporter: siarhei.siamashka at gmail dot com
  Target Milestone: ---

The performance of the following simple binary search code regressed a lot
starting from GDC11:

/*******************************************************/
import std.algorithm, std.range, std.stdio, std.stdint;

// calculate integer square root using binary search
int64_t isqrt(int64_t x) {
  return iota(0, min(x, 3037000499) + 1)
         .map!(v => (v * v > x))
         .assumeSorted.lowerBound(true)
         .length - 1;
}

// print the sum of 20M square roots
void main() { 20000000.iota.map!isqrt.sum.writeln; }
/*******************************************************/

$ gdc-6.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.924s
user    0m1.924s
sys     0m0.000s

$ gdc-9.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m2.100s
user    0m2.099s
sys     0m0.000s

$ gdc-10.3.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m1.776s
user    0m1.776s
sys     0m0.000s

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check test.d && time ./a.out 
59618479180

real    0m6.889s
user    0m6.887s
sys     0m0.000s


My expectation is that the compilers should inline everything here and generate
code for a small and efficient binary search loop. But GDC11 stopped doing
this, as can be confirmed by running "perf record ./a.out && perf report":

    27.86%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T18getTransitionIndexVEQGrQGq12SearchPolicyi3SQHoQHn__TQHkTQHaVQDha5_61203c2062ZQIj3geqTbZQDlMFNaNbNiNfbZm
    15.02%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc__T3geqTbTbZQjMFNaNbNiNfbbZb
    10.34%  a.out    a.out             [.]
_D3std9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQCm5range__T4iotaTiTlZQkFilZ6ResultZQCv7opIndexMFNaNbNiNfmZb
    10.31%  a.out    a.out             [.]
_D3std10functional__T9binaryFunVAyaa5_61203c2062VQra1_61VQza1_62Z__TQBvTbTbZQCdFNaNbNiNfKbKbZb
     3.03%  a.out    a.out             [.]
_D3std5range__T4iotaTiTlZQkFilZ6Result7opIndexMNgFNaNbNiNfmZNgl
     2.34%  a.out    a.out             [.] 0x0000000000031a09
     2.28%  a.out    a.out             [.]
_D4core6atomic__T7casImplTmTxmTmZQqFNaNbNiNePOmxmmZb
     2.11%  a.out    a.out             [.]
_D3std5range__T11SortedRangeTSQBc9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQDnQDm__T4iotaTiTlZQkFilZ6ResultZQCsVAyaa5_61203c2062ZQFc7opSliceMFNaNbNiNfmmZSQGoQGn__TQGkTQGaVQCha5_61203c2062ZQHj
     2.02%  a.out    a.out             [.]
_D3std5range__T12assumeSortedVAyaa5_61203c2062TSQBu9algorithm9iteration__T9MapResultS4test5isqrtFlZ9__lambda2TSQEfQEe__T4iotaTiTlZQkFilZ6ResultZQCsZQFdFNaNbNiNfQEjZSQGhQGg__T11SortedRangeTQFlVQGga5_61203c2062ZQBj


Using either -fwhole-program or -flto cmdline options resolves the performance
problem and allows all of these functions to be inlined again:

$ gdc-11.2.0 -g -O3 -frelease -fno-bounds-check -flto test.d && time ./a.out 
59618479180

real    0m2.085s
user    0m2.085s
sys     0m0.000s


But is this expected? Does GDC now require using -flto option for getting
reasonable performance starting from version 11? Or is this a real performance
regression and something can be done to improve the inlining behaviour?

[Bug d/102765] New: [11 Regression] GDC11 stopped inlining library functions and lambdas used by a binary search one-liner code

Reply via email to