PS: I framed the issue between the inline_param and fixup_cfg passes because I was only looking at the tree passes, but the really relevant passes are tree-einline (obviously) and ipa-inline, which happens between tree-inline_param2 and tree-fixup_cfg2. So, restating the problem: if early inline is not happening, late inline will miss the chance to inline the reference from the static const struct member.
On Thu, Feb 4, 2016 at 1:08 PM, Carlos Pita <carlosjosep...@gmail.com> wrote: > Hi all, > > I've been trying to understand some bizarre interaction between > optimizing passes I've observed while compiling a heavily nested > inlined numerical code of mine. I managed to reduce the issue down to > this simple code: > > ``` test.c > > typedef struct F { > int (*call)(int); > } F; > > static int g(F f, int x) { > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > x = f.call(x); > return x; > } > > static int sq(int x) { > return x * x; > } > > static const F f = {sq}; > > void dosomething(int); > > int h(int x) { > dosomething(g(f, x)); > dosomething(g((F){sq}, x)); > } > > ``` > > Here we have a driver function h calling the workhorse g which > delegates some simple task to the inline-wannabe f. The distinctive > aspect of the above scheme is that f is referenced from a struct > member. The first call to g passes a static const struct while the > second call passes a compound literal (alternatively, a local version > of the struct will have the same effect regarding what follows). > > Now, say I compile this code with: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c > > The einline pass will not be able to inline calls to g with such a low > value for early-inlining-insns. > > The inline_param2 pass still shows: > > ``` > h (int x) > { > struct F D.1847; > int _4; > int _8; > > <bb 2>: > _4 = g (f, x_2(D)); > dosomething (_4); > D.1847.call = sq; > _8 = g (D.1847, x_2(D)); > dosomething (_8); > return; > > } > > ``` > > The next tree pass is fixup_cfg4, which does the inline but just for > the second all to g: > > ``` > h (int x) > { > .... > > <bb 2>: > f = f; > f$call_7 = MEM[(struct F *)&f]; > x_19 = f$call_7 (x_2(D)); > x_20 = f$call_7 (x_19); > x_21 = f$call_7 (x_20); > x_22 = f$call_7 (x_21); > x_23 = f$call_7 (x_22); > x_24 = f$call_7 (x_23); > x_25 = f$call_7 (x_24); > x_26 = f$call_7 (x_25); > _43 = x_26; > _4 = _43; > dosomething (_4); > D.1847.call = sq; > f = D.1847; > f$call_10 = MEM[(struct F *)&f]; > _33 = x_2(D) * x_2(D); > _45 = _33; > x_11 = _45; > _32 = x_11 * x_11; > _46 = _32; > x_12 = _46; > _31 = x_12 * x_12; > _47 = _31; > x_13 = _47; > _30 = x_13 * x_13; > _48 = _30; > x_14 = _48; > _29 = x_14 * x_14; > _49 = _29; > x_15 = _49; > _28 = x_15 * x_15; > _50 = _28; > x_16 = _50; > _27 = x_16 * x_16; > _51 = _27; > x_17 = _51; > _3 = x_17 * x_17; > _52 = _3; > x_18 = _52; > _53 = x_18; > _8 = _53; > dosomething (_8); > return; > > } > ``` > > Now, say I recompile the code with a larger early-inlining-insns, so > that einline is able to early inline both calls to g: > > gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c > > After inline_param2 (that is, before fixup_cfg4), we now have: > > ``` > h (int x) > { > int x; > int x; > > <bb 2>: > x_13 = sq (x_2(D)); > x_14 = sq (x_13); > x_15 = sq (x_14); > x_16 = sq (x_15); > x_17 = sq (x_16); > x_18 = sq (x_17); > x_19 = sq (x_18); > x_20 = sq (x_19); > dosomething (x_20); > x_5 = sq (x_2(D)); > x_6 = sq (x_5); > x_7 = sq (x_6); > x_8 = sq (x_7); > x_9 = sq (x_8); > x_10 = sq (x_9); > x_11 = sq (x_10); > x_12 = sq (x_11); > dosomething (x_12); > return; > > } > ``` > > And fixup_cfg4 is able to do its job for both calls: > > ``` > h (int x) > { > .... > > <bb 2>: > _36 = x_2(D) * x_2(D); > _37 = _36; > x_13 = _37; > _35 = x_13 * x_13; > _38 = _35; > x_14 = _38; > _34 = x_14 * x_14; > _39 = _34; > x_15 = _39; > _33 = x_15 * x_15; > _40 = _33; > x_16 = _40; > _32 = x_16 * x_16; > _41 = _32; > x_17 = _41; > _31 = x_17 * x_17; > _42 = _31; > x_18 = _42; > _30 = x_18 * x_18; > _43 = _30; > x_19 = _43; > _29 = x_19 * x_19; > _44 = _29; > x_20 = _44; > dosomething (x_20); > _28 = x_2(D) * x_2(D); > _45 = _28; > x_5 = _45; > _27 = x_5 * x_5; > _46 = _27; > x_6 = _46; > _26 = x_6 * x_6; > _47 = _26; > x_7 = _47; > _25 = x_7 * x_7; > _48 = _25; > x_8 = _48; > _24 = x_8 * x_8; > _49 = _24; > x_9 = _49; > _23 = x_9 * x_9; > _50 = _23; > x_10 = _50; > _22 = x_10 * x_10; > _51 = _22; > x_11 = _51; > _21 = x_11 * x_11; > _52 = _21; > x_12 = _52; > dosomething (x_12); > return; > > } > ``` > > The bottom line is that I get full inlining if einline manages to > early inline both g calls, but I get incomplete inlining otherwise. I > guess the problem is that fixup_cfg4 is not able to infer that > f$call_7 is just sq in disguise when f is the global static const > struct but it is able to get it when it's a local or literal one. In > case einline expands the code early the successive passes will make > fixup_cfg4 see just sq in both cases, making inlining of sq a trivial > matter. But if einline hits its hard limits, fixup_cfg4 will have to > figure out that f$call is sq by itself. > > I'm not sure whether this should be considered a proper bug or more of > a quirk of the inlining system one must learn to live with. In the > first case, I'll report it if you ask me to do it. In the second case, > I would like to ask for some advice about the best way to cope with > this scenario (besides blindly incrementing early-inlining-insns); I > can provide more background regarding my real use case if necessary. > > Cheers > -- > Carlos