Hi all, I've been trying to understand some bizarre interaction between optimizing passes I've observed while compiling a heavily nested inlined numerical code of mine. I managed to reduce the issue down to this simple code:
``` test.c typedef struct F { int (*call)(int); } F; static int g(F f, int x) { x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); x = f.call(x); return x; } static int sq(int x) { return x * x; } static const F f = {sq}; void dosomething(int); int h(int x) { dosomething(g(f, x)); dosomething(g((F){sq}, x)); } ``` Here we have a driver function h calling the workhorse g which delegates some simple task to the inline-wannabe f. The distinctive aspect of the above scheme is that f is referenced from a struct member. The first call to g passes a static const struct while the second call passes a compound literal (alternatively, a local version of the struct will have the same effect regarding what follows). Now, say I compile this code with: gcc -O3 -fdump-tree-all --param early-inlining-insns=10 -c test.c The einline pass will not be able to inline calls to g with such a low value for early-inlining-insns. The inline_param2 pass still shows: ``` h (int x) { struct F D.1847; int _4; int _8; <bb 2>: _4 = g (f, x_2(D)); dosomething (_4); D.1847.call = sq; _8 = g (D.1847, x_2(D)); dosomething (_8); return; } ``` The next tree pass is fixup_cfg4, which does the inline but just for the second all to g: ``` h (int x) { .... <bb 2>: f = f; f$call_7 = MEM[(struct F *)&f]; x_19 = f$call_7 (x_2(D)); x_20 = f$call_7 (x_19); x_21 = f$call_7 (x_20); x_22 = f$call_7 (x_21); x_23 = f$call_7 (x_22); x_24 = f$call_7 (x_23); x_25 = f$call_7 (x_24); x_26 = f$call_7 (x_25); _43 = x_26; _4 = _43; dosomething (_4); D.1847.call = sq; f = D.1847; f$call_10 = MEM[(struct F *)&f]; _33 = x_2(D) * x_2(D); _45 = _33; x_11 = _45; _32 = x_11 * x_11; _46 = _32; x_12 = _46; _31 = x_12 * x_12; _47 = _31; x_13 = _47; _30 = x_13 * x_13; _48 = _30; x_14 = _48; _29 = x_14 * x_14; _49 = _29; x_15 = _49; _28 = x_15 * x_15; _50 = _28; x_16 = _50; _27 = x_16 * x_16; _51 = _27; x_17 = _51; _3 = x_17 * x_17; _52 = _3; x_18 = _52; _53 = x_18; _8 = _53; dosomething (_8); return; } ``` Now, say I recompile the code with a larger early-inlining-insns, so that einline is able to early inline both calls to g: gcc -O3 -fdump-tree-all --param early-inlining-insns=50 -c test.c After inline_param2 (that is, before fixup_cfg4), we now have: ``` h (int x) { int x; int x; <bb 2>: x_13 = sq (x_2(D)); x_14 = sq (x_13); x_15 = sq (x_14); x_16 = sq (x_15); x_17 = sq (x_16); x_18 = sq (x_17); x_19 = sq (x_18); x_20 = sq (x_19); dosomething (x_20); x_5 = sq (x_2(D)); x_6 = sq (x_5); x_7 = sq (x_6); x_8 = sq (x_7); x_9 = sq (x_8); x_10 = sq (x_9); x_11 = sq (x_10); x_12 = sq (x_11); dosomething (x_12); return; } ``` And fixup_cfg4 is able to do its job for both calls: ``` h (int x) { .... <bb 2>: _36 = x_2(D) * x_2(D); _37 = _36; x_13 = _37; _35 = x_13 * x_13; _38 = _35; x_14 = _38; _34 = x_14 * x_14; _39 = _34; x_15 = _39; _33 = x_15 * x_15; _40 = _33; x_16 = _40; _32 = x_16 * x_16; _41 = _32; x_17 = _41; _31 = x_17 * x_17; _42 = _31; x_18 = _42; _30 = x_18 * x_18; _43 = _30; x_19 = _43; _29 = x_19 * x_19; _44 = _29; x_20 = _44; dosomething (x_20); _28 = x_2(D) * x_2(D); _45 = _28; x_5 = _45; _27 = x_5 * x_5; _46 = _27; x_6 = _46; _26 = x_6 * x_6; _47 = _26; x_7 = _47; _25 = x_7 * x_7; _48 = _25; x_8 = _48; _24 = x_8 * x_8; _49 = _24; x_9 = _49; _23 = x_9 * x_9; _50 = _23; x_10 = _50; _22 = x_10 * x_10; _51 = _22; x_11 = _51; _21 = x_11 * x_11; _52 = _21; x_12 = _52; dosomething (x_12); return; } ``` The bottom line is that I get full inlining if einline manages to early inline both g calls, but I get incomplete inlining otherwise. I guess the problem is that fixup_cfg4 is not able to infer that f$call_7 is just sq in disguise when f is the global static const struct but it is able to get it when it's a local or literal one. In case einline expands the code early the successive passes will make fixup_cfg4 see just sq in both cases, making inlining of sq a trivial matter. But if einline hits its hard limits, fixup_cfg4 will have to figure out that f$call is sq by itself. I'm not sure whether this should be considered a proper bug or more of a quirk of the inlining system one must learn to live with. In the first case, I'll report it if you ask me to do it. In the second case, I would like to ask for some advice about the best way to cope with this scenario (besides blindly incrementing early-inlining-insns); I can provide more background regarding my real use case if necessary. Cheers -- Carlos